Algorithm Details¶

Introduction
Monte Carlo Analysis of Calibration Data
NDIR Correction: Convex Hull
- Algorithm
Split Point Uncertainty: Attenuation Decline
Carbon Mass: Posterior Distributions
- Candidate distributions
References

Introduction ¶

Novel features of the OCECgo software tool’s algorithm are briefly presented here.

Monte Carlo Analysis of Calibration Data ¶

As discussed by Conrad & Johnson [1], the Sunset Laboratory Model 4 instrument performs an in-test calibration at the end of each thermal analysis. A known volume of a 5% methane-helium mixture is introduced into the instrument, evolved into carbon dioxide, and measured by the NDIR. The integrated NDIR signal during this in-test calibration (the CH₄-loop) corresponds to the known mass of injected carbon (in methane) and is therefore a means to quantify the NDIR sensitivity during the test. This NDIR sensitivity is then coupled with the NDIR signal during the sample analysis phases to quantify organic (OC) and elemental carbon (EC) in the sample. The injected mass of the methane-helium mixture however, is sensitive to operating conditions and can vary in time; consequently, it must be calibrated with some frequency (monthly calibrations are recommended by the manufacturer).

Calibrating CH₄-Loop Carbon Mass ¶

Calibration of carbon mass injected during the CH₄-loop is accomplished using use of an external standard. An aqueous sucrose solution of known concentration is introduced to the instrument. A thermal analysis is then performed to obtain the integrated NDIR signal during the analysis (“total area”, \(x_i\)) corresponding to a known carbon mass (\(y_i\)) in sucrose, as well as the integrated NDIR signal during the CH₄-loop (“calibration area”, \(c_i\)). This procedure is repeated \(I\) times (i.e., \(i \in \left\{ 1...I \right\}\)) with three different volumes of injected sucrose: 0, 5, and 10 μL. The injected carbon mass during the CH₄-loop is then quantified by performing a linear regression of \(x\) and \(y\) and computing the value of \(y\) corresponding to the mean of the calibration areas (\(\overline{c_i}\)). In summary, the linear regression estimates the mean sensitivity of the NDIR detector (total area vs. carbon mass) and the mean calibration area is substituted to estimate the actual injected mass of carbon in the CH₄-loop.

Uncertainty in Calibration ¶

There are numerous sources of error in this calibration procedure:

Uncertainty in the sucrose solution’s concentration of carbon, affecting \(y_i\).

Uncertainty in the introduction of desired volumes of the solution into the instrument (via pipette), affecting \(y_i\).

Bias in the NDIR detector, affecting \(x_i\) and \(c_i\).

Uncertainties associated with linear regression, affecting the calibrated carbon mass (\(m_C\)).

The OCECgo software employes a Monte Carlo method (MCM) to propagate these sources of error into the computation of the CH₄-loop carbon mass. Refer to the calibration tool for a detailed description of user inputs.

OCECgo Monte Carlo Method for Calibration ¶

Uncertainty in Total Area of Each Data (\(x_i\))¶

Uncertainty in the total area from the thermal analysis of each sucrose-laden sample is quantified with the bias of the NDIR detector. The NDIR bias is estimated from the \(I\) calibration areas by assuming that the actual injected carbon mass during the CH₄-loop is constant over the acquisition of calibration data and variance in the reported calibration areas are a function of NDIR bias alone. Mathematically, the relative bias of the NDIR detector is:

\[b_{NDIR} = \frac{s_c}{\overline{c_i}}\]

where \(s_c\) represents the standard deviation of the \(c_i\).

Within the MCM, the j^th random draw of \(x_i\) (\(= x_{i,j}\)) is:

\[x_{i,j} = \left( 1 + \xi_j \right) * x_i\]

and:

\[\xi_j \sim \mathcal{N} \left( 0, b_{NDIR} \right)\]

where \(\mathcal{N} \left( \mu,\sigma \right)\) is a normal distribution.

Uncertainty in Mean Calibration Area (\(\overline{c_i}\))¶

Within the MCM, the j^th random draw of \(\overline{c_i}\) (\(= C_j\)) is quantified using the distribution of the mean. Since this is affected by bias, the random variable from above is necessarily used:

\[C_j = \left( 1 + \frac{\xi_j}{\sqrt{I}} \right) * \overline{c_i}\]

where the term \(\sqrt{I}\) implies that this is a distribution of the mean.

Uncertainty in Applied Carbon in Sucrose (\(y_i\))¶

Uncertainty in the applied carbon in sucrose is quantified in a two-step procedure. For the j^th random draw:

The mass concentration of sucrose-bound carbon in the aqueous solution is estimated:

The mass fraction of sucrose in the solution is estimated using the nominal sucrose and water masses, while considering bias in the scale used to measure the masses and uncertainty in the purity of the sucrose.

The density of the solution (sensitive to ambient temperature and composition of the solution) is quantified using the empirical model of Darros-Barbosa et al. [2].

The carbon fraction of sucrose (42.11% carbon) is used to obtain the mass of carbon per unit volume of sucrose solution.

Depending on the volume of solution applied for the specific calibration data, pipette uncertainty (bias and precision) are applied to the solution volume. Solution volume is then multiplied with the carbon fraction of sucrose (1c) to obtain a Monte Carlo-estimate of \(y_{i,j}\).

Uncertainty Propagation through Regression¶

When propagating the uncertainties of \(x_i\), \(y_i\), and \(c_i\) through the linear regression to obtain the CH₄-loop carbon mass, two sources of error need to be considered:

Nominal uncertainty in the regression - i.e., the prediction interval of the regression.

Regression uncertainty due to uncertainty of the regressed data.

Within OCECgo, the former is handled using typical techniques for linear regression of the nominal (i.e., not Monte Carlo-perturbed) values of \(x_i\) and \(y_i\). This yields standard errors of the slope and intercept - \(\mathrm{se}_{\beta_{0,1}}\) and \(\mathrm{se}_{\beta_{1,1}}\), respectively - where, for \(\beta_{m,n}\), \(m \in \left\{ 0,1 \right\}\) corresponds to the regression slope (0) and intercept (1), and \(n \in \left\{ 1,2 \right\}\) corresponds to the above-listed sources of error.

Standard errors of the slope and intercept due to uncertainty of the regressed data (error source 2) are estimated via the MCM. This is accomplished by performing a linear fit to the dataset corresponding to the j^th Monte Carlo draw. That is, the j^th Monte Carlo-estimate of slope and intercept (\(\beta_{0,2,j}\) and \(\beta_{1,2,j}\)) are obtained from regression of \(\left\{ x_{i,j} \right\}\) and \(\left\{ y_{i,j} \right\}\) in the i^th dimension. This yields, for \(J\) total Monte Carlo draws, \(J \times 1\) vectors of slope and intercept from which the standard errors are computed:

\[{\begin{array}{cccc} \mathrm{se}_{\beta_{0, 2}} = \frac{s_{\beta_{0,2}}}{\sqrt{J}} & & & \mathrm{se}_{\beta_{1, 2}} = \frac{s_{\beta_{1,2}}}{\sqrt{J}} \end{array}}\]

Total standard errors of the regression slope and intercept are then computed by summing these two sources in quadrature:

\[{\begin{array}{cccc} \mathrm{se}_{\beta_{0}} = \sqrt{\mathrm{se}_{\beta_{0,1}}^2 + \mathrm{se}_{\beta_{0,2}}^2} & & & \mathrm{se}_{\beta_{1}} = \sqrt{\mathrm{se}_{\beta_{1,1}}^2 + \mathrm{se}_{\beta_{1,2}}^2} \end{array}}\]

The prediction interval when using this regression is a function of the standard errors:

\[f_{(x)} = \beta_{0,1} * x + \beta_{1,1} + t_{\nu} \left( \mathrm{se}_{\beta_{0}}^2 x^2 + \mathrm{se}_{\beta_{1}}^2 + 2 \mathrm{se}_{\beta_{0}} \mathrm{se}_{\beta_{1}} \rho_{01} x \right) ^ \frac{1}{2}\]

where the t-statistic has \(\nu = I - 2\) degrees of freedom and \(\rho_{01}\) is the slope-intercept correlation (associated with error source 1).

Within the MCM, the j^th carbon mass injected during the CH₄-loop is quantified by:

Obtaining a random calibration area, \(C_j\);

Obtaining a random T-variable from the standard T-distribution with \(\nu\) degrees of freedom; and

Computing the j^th carbon mass: \(m_{C,j} = f_{(C_j)}\).

Finally, a generalized T-distribution is fit to \(m_{C,j}\) via maximum likelihood estimation and is reported in the OCECgo GUI.

NDIR Correction: Convex Hull ¶

The Sunset Laboratory Model 4 instrument measures evolved carbon as carbon dioxide using a non-dispersive infrared (NDIR) detector. The detector is known to drift over a relatively short time-frame, such that the baseline of the NDIR signal (the “true zero” of the detector) is not constant during a thermal-optical analysis. Consequently, the baseline NDIR signal must be estimated and the raw NDIR signal must be corrected to enable an accurate measure of evolved carbon. Fortuitously, at specific instances during a thermal-optical analysis, it can be expected that carbon dioxide is not present in the instrument, including:

At the start of the thermal protocol, following purge of the instrument with helium.

Immediately prior the Ox-phase of the thermal protocol, where organic carbon has completely pyrolyzed.

Immediately following the thermal protocol, prior to the CH₄-loop.

If it is known with confidence that carbon dioxide from evolved organic (OC) and elemental carbon (EC) is not present at these instances, then they can be used to derive the baseline of the NDIR detector. Indeed, the proprietary software provided with the instrument performs a linear fit to these data when correcting the NDIR signal for drift. The OCECgo software tool allows the user to estimate the NDIR baseline with the linear fit performed by the instrument. This is accomplished by selecting the “From results file” option in section 2(b) of the “Data Analysis Tool - Inputs” tab. Instrument-reported total and calibration areas are acquired from an appropriate results file and are used to back-calculate the NDIR correction and apply it to the raw signal. However, if unexpected carbon dioxide is present, this linear fit drift correction procedure will provide erroneous results - this has been observed by the author(s) and is discussed within the literature [3].

Alternatively, OCECgo allows the user to perform the NDIR correction with a novel method termed the “Convex Hull” technique. This approach leverages the fact that (disregarding noise in the NDIR detector) the quantity of NDIR-measured carbon dioxide is by definition non-negative. That is, the NDIR signal will always represent the presence of zero or more carbon dioxide. With this in mind, the NDIR baseline is quantified in this technique by computing a (lower) convex hull that bounds the raw NDIR signal. This is an improvement to the above technique in that the NDIR correction may therefore be non-linear (piecewise) and it does not make any prior assumptions that the carbon dioxide concentration measured by the NDIR should be zero at any specific moment in time. Specifics of the technique are presented in the Algorithm section below.

Algorithm ¶

Let \({\left\{x_t : t\ \in\ {1...T}\right\}}\) represent the NDIR time-series, reported at a frequency of 1 Hz for a total of \(T\) seconds. For a time-series, a point \(x_t\) is a vertex of the lower convex hull if there exists another point at \(t'\) such that the line through \(x_t\) and \(x_{t'}\) is a lower bound to every point in the time series. Mathematically:

\[x_t \in S \iff \exists\ t' : \frac{x_i - x_t}{i - t} \geq \frac{x_{t'} - x_t}{t' - t},\ \forall\ i \in {1...T}\]

where \(S\) is the set of vertices of the lower convex hull.

To compute the NDIR baseline within the OCECgo algorithm, two additional steps are taken:

A positive shift is applied to the convex hull vertices to account for the noise characteristic of the NDIR detector.

Cubic interpolation of the convex hull vertices is performed to obtain the NDIR correction at all instances in time.

MATLAB Code¶

Shown below is example MATLAB code to compute the NDIR baseline of the time-series x using the convex hull technique.

% Get length of time-series
T = length(x);

% Initialize vector to track indices of lower convex hull vertices.
% The first and last point of the time-series are always vertices of the hull
vertices = 1;

% Compute indices of lower convex hull vertices
index = 1;
while index < T

  % Find slope from existing point to all of the following points
  slope_to_next_point = (x(index+1 : T) - x(index)) ./ (1 : T - index);

  % Determine the index of the point with minimum slope
  [~, index_of_minimum] = min(slope_to_next_point);

  % Update to next index
  index = index + index_of_minimum;

  % Update vertice tracking
  vertices = cat(1, vertices, index);

end

% Calculate NDIR baseline
NDIR_baseline = x(vertices);

% Add positive shift to account for instrument noise
NDIR_baseline = NDIR_baseline + noise_characteristic;

% Perform cubic interpolation to obtain time-series of NDIR baseline
NDIR_baseline = interp1(vertices, NDIR_baseline, 1 : T, 'cubic');

Split Point Uncertainty: Attenuation Decline ¶

OCECgo includes an optional, novel technique for the estimation of split point uncertainty - the Attenuation Decline technique. This method leverages the notion that there exists quantifiable uncertainty in instantaneous laser attenuation through the quartz filter such that uncertainty in split point estimation by the thermal-optical transmittance (TOT) method can be objectively quantified. The procedure is to first compute the nominal split point using the TOT method with the computed initial laser attenuation. The initial attenuation is then decreased by some fraction and a secondary split point is computed using the TOT method. By selecting the attenuation decrease based on the ability to quantify attenuation (i.e., the 2σ uncertainty in instantaneous attenuation), the difference in the computed split points represents the 2σ uncertainty in the split point – i.e., uncertainty in attenuation is propagated through the TOT method.

TOT Method ¶

The TOT method enables the estimation of split point by leveraging the assumption that the optical properties of pyrolyzed carbon (PC) and EC are approximately equal. If true, the split point is estimated as the instance in time when laser attenuation (which first increases due to pyrolization of OC) recovers to the value at the start of the analysis. Let \(I_{(t)}^o\) and \(I_{(t)}^t\) represent the incident laser intensity (optically upstream of the quartz filter) and transmitted (measured) laser intensity at time \(t\). Attenuation is:

\[A_{(t)} = -\mathrm{ln}\left( \frac{I_{(t)}^t}{I_{(t)}^o} \right)\]

The premise of the TOT method is then to find the latest time \(t_s\) such that \(A_{(t_s)} \approx A_{(0)}\). Simple algebraic manipulation yields an alternative form of this equation, which is useful for the estimation of split point uncertainty:

\[t_s = t : \frac{I_{(0)}^t}{I_{(t)}^t} \frac{I_{(t)}^o}{I_{(0)}^o} \approx 1\]

Split Point Uncertainty ¶

Within the OCECgo algorithm, the instantaneous incident intensity, \(I_{(t)}^o\), is modelled as a function of measured (actual) oven temperature, \(f_{(T_{a(t)})}\). The critical equation for the TOT method is therefore:

\[\frac{I_{(0)}^t}{I_{(t)}^t} \frac{f_{(T_{a(t)})}}{f_{(T_{a(0)})}} \approx 1\]

where \(f_{(x)}\) is a second- or first-order polynomial, as chosen by the user.

By inspection, uncertainty of the TOT method - i.e., uncertainty in the left-hand side of the equation, \(U_{TOT}\) - is a function of three items that are quantified using laser power and oven temperature data from the CH₄-loop of the thermal protocol:

Noise in the photodiode measuring the laser power - affecting \(I_{(0)}^t\) and \(I_{(t)}^t\). Represented by the relative 2σ uncertainty of the photodiode, \(U_I\).

Noise in the thermocouple measuring front oven temperature - affecting \(T_{a(0)}\) and \(T_{a(t)}\). Represented by the relative 2σ uncertainty of the thermocouple, \(U_T\).

Uncertainty of the model of incident intensity as a function of oven temperature - affecting \(f_{(x)}\). Represented by the relative 2σ prediction interval of the regression, \(U_f\).

Instrument noise (items 1 and 2) is inherently uncorrelated in time, but uncertainty in the model of incident intensity as a function of oven temperature (item 3) may in fact be correlated in time. Propagating these uncertainties with a first-order Taylor series expansion, uncertainty in the TOT method is:

\[U_{TOT} = 2U_I^2 + 2U_T^2 + 2U_f^2 \left( 1 - \rho_{f(t_s)} \right)\]

where \(\rho_{f(t_s)} \in [-1,1]\) is the correlation of the fitted model at \(T_{a(0)}\) and \(T_{a(t)}\). While this correlation cannot be predicted, OCECgo assumes \(\rho_f = -1\) to yield a conservative estimate of \(U_{TOT}\):

\[U_{TOT} = 2U_I^2 + 2U_T^2 + 4U_f^2\]

OCECgo-recommended Attenuation Decline ¶

If the assumption that the optical properties of PC and EC were accurate, the above-derived \(U_{TOT}\) would represent the attenuation decline necessary to quantify split point uncertainty when using the TOT method. However, this assumption should not be expected to be true. Consequently, when recommending a decline in attenuation, OCECgo expands \(U_{TOT}\) by a factor of 2 to consider (at least in part) uncertainty due to the assumption of equal PC-EC optical properties.

Carbon Mass: Posterior Distributions ¶

For each analysis, OCECgo reports the best-fitting posterior distribution for OC, EC, and total carbon (TC) mass. The distribution is selected from a set of 9 candidate distributions (shown below in alphabetical order) that are fit using Maximum Likelihood Estimation. The best-fitting distribution is selected using the Akaike Information Criterion [4].

Candidate distributions ¶

Extreme Value¶

\[f(x|\mu,\sigma) = \frac{1}{\sigma} \textrm{exp} \left( \frac{x-\mu}{\sigma} \right) \textrm{exp} \left( {-\textrm{exp} \left( \frac{x-\mu}{\sigma} \right)} \right)\]

Folded-Normal¶

\[f(x|\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \left[ \textrm{exp} \left( \frac{-(x-\mu)^2}{2\sigma^2} \right) + \textrm{exp} \left( \frac{-(x+\mu)^2}{2\sigma^2} \right)\right]\]

Note

The folded-normal distribution is equivalent to the normal distribution with non-negative support - i.e., if \(\mu \gg \sigma\), the folded-normal and normal distributions are effectively equal. As such, if \(\mu \gt 3\sigma\), OCECgo will not consider the folded-normal distribution for the best fit.

Gamma¶

\[f(x|a,b) = \frac{1}{b^a \Gamma (a)} x^{a-1} \textrm{exp} \left( \frac{-x}{b} \right)\]

Generalized T¶

\[f(x|\mu,\sigma,\nu) = \frac{\Gamma \left( \frac{\nu+1}{2} \right)}{\sigma\sqrt{\nu\pi}\ \Gamma\left( \frac{\nu}{2} \right)} \left[ \frac{\nu + \left( \frac{x-\mu}{\sigma} \right) ^2}{\nu} \right] ^{-\frac{\nu+1}{2}}\]

Note

The generalized T distribution converges to the normal distribution at large degrees of freedom, \(\nu\). As such, if \(\nu \gt 60\), OCECgo will not consider the generalized T distribution for the best fit.

Logistic¶

\[f(x|\mu,\sigma) = \frac{\textrm{exp} \left( \frac{x-\mu}{\sigma} \right)}{\sigma \left[ {1 + \textrm{exp} \left( \frac{x-\mu}{\sigma} \right) } \right] ^2}\]

Log-logistic¶

\[f(x|\mu,\sigma) = \frac{\textrm{exp} \left( \frac{\mathrm{ln}\ x-\mu}{\sigma} \right)}{\sigma x \left[ {1 + \textrm{exp} \left( \frac{\mathrm{ln}\ x-\mu}{\sigma} \right) } \right] ^2}\]

Lognormal¶

\[f(x|\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}} \textrm{exp} \left( \frac{-(\mathrm{ln}\ x-\mu)^2}{2\sigma^2} \right)\]

Normal¶

\[f(x|\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \textrm{exp} \left( \frac{-(x-\mu)^2}{2\sigma^2} \right)\]

Rayleigh¶

\[f(x|b) = \frac{x}{b^2} \textrm{exp} \left( \frac{-x^2}{2b^2} \right)\]

References ¶

[1]	Conrad, B.M. & Johnson, M.R. (2019), Calibration protocol and software for split point analysis and uncertainty quantification of thermal-optical organic/elemental carbon measurements, J. Vis. Exp., 151:e59742 (doi: 10.3791/59742)

[2]	Darros-Barbosa, R., Balaban, M., Teixeira, A. (2003), Temperature and concentration dependence of density of model liquid foods. Int. J. Food Prop., 6(2):195-214 (doi: 10.1081/JFP-120017815

[3]	Zheng, G.J., Cheng Y., He K.B., Duan F.K., & Ma Y.L. (2014), A newly identified discrepancy of the sunset semi-continuous carbon analyzer. Atmos. Meas. Tech., 7:1969-1977 (doi: 10.5194/amt-7-1969-2014)

[4]	Akaike, H. (1974), A new look at the statistical model identification. IEEE Trans. on Autom. Cont., 19(6):716-723 (doi: 10.1109/TAC.1974.1100705)