Finite resolution and serial correlations in mass metrology

In this study the influence of the digital resolution on the properties of a data sample is experimentally determined in mass measurements. A mass comparator with adjustable digital resolution interval, also known as the quantization step size, was used in well controlled repeatable conditions. The same measurement procedure with the resolution differing by up to a factor of 200 was repeatedly carried out, and at least 150 mass differences were recorded with every resolution setting. A clear relationship was observed between the digital resolution and the type of random process characteristics for the data sample. Analysis shows that a white noise process dominates for data sets measured with the smallest digital resolution step from 0.001 mg and up to 0.005 mg. For resolutions from 0.01 mg to 0.2 mg random walk noise is observed for which variance of the sample mean can increase proportionally with sample size. We demonstrate that instrumental resolution is a strictly limiting factor in mass measurements only for the data sets with significant positive correlations, such as those having random walk noise. Otherwise for the white noise process, the smallest possible variance is inversely proportional to averaging time, like in time-frequency metrology, and not limited by the instrumental resolution. Our measurement results show that in this case sample standard deviation of the mean (0.00003 mg) can be more than ten times smaller than that of a single result (0.0005 mg) or the Type B component of the digital resolution (0.00029 mg).


INTRODUCTION
* Specially designed statistical methods are recommended for the uncertainty analysis of randomly varying repeated measurements that may be correlated [1].Such special methods include Allan variance, autocorrelation function, power spectral density, etc., and are well developed in the time and frequency domain [2][3][4].They are commonly used for electrical and radiation standard measurements [5][6][7][8][9], but in well established methods such as mass measurement [10,11] they are still rather rare.Nevertheless, in all metrology areas the influence of correlations on the uncertainty evaluation of repeated measurements can be important [1,2,5].
Random processes peculiar to the time series measured at uniform intervals can be of many different types [2,3].In the time and frequency field random processes are modelled by five integer power law spectrums ( ) ~, y S f f α where the appropriate exponent α varies from − 2 to + 2 depending on the instrument used and the region of Fourier frequency f or the averaging time τ under consideration.For electrical quantities two spectrums are considered [5], with 1 α = − and 0.
α = However, in some other fields usually only a white noise process with 0 α = (uncor- related measured quantity values) is assumed [2,5].Although at first glance the measurement sequences may imply different types of random process, it is impossible to clearly distinguish a white noise process from a 1 f noise process purely by looking at a plot of a time series or at its histogram [5].These distinctions are important, as the uncertainty associated with these processes will be much different: the variance of the sample mean of a white noise process is inversely proportional to the number of values in the sample, whereas for a 1 f noise process the variance of the sample mean is independent of sample size.
Digital data processing is widely used in modern measurements with rather different sampling frequencies and digital resolutions applied.It is well known that quantization of the analogue signal is a highly nonlinear process, but for the probability densities of the initial and quantized signals the process can be linear [12].Therefore quantizing conditions are focused on the probability density functions (PDF) of data samples rather than on the particular values.These conditions are formulated in quantizing theorems (QT).According to QT I [12], the PDF of initial signal x can be derived from the PDF of the output digital signal x′ if the quantization step size q is suitably small in comparison with the breadth of the PDF.According to QT II, the moments of x can be calculated from the moments of x′ if q is respectively small.For QT II the size of q can be up to twice as large as needed for QT I. So, among digitally measured time series there are quite obviously those fulfilling the requirements of QT I and/or QT II, but also those measured with larger q and not fulfilling either QT.The situation can be even more complicated when the PDF of an initial signal is a combination of different processes.For example, a white noise can be present together with some low frequency noise, which may substantially burr the limits for the validity of QTs.Thus, specifying the dominant noise of the sample may be helpful for the further data analysis.
Different methods including conventional frequentist [1,[13][14][15][16], Bayesian [17][18][19][20], and fiducial inference [21,22] have been developed for the estimation of the measurand and the uncertainty of repeated measurements affected by finite resolution.For the theoretical modelling of the problem Monte Carlo simulation [23][24][25][26][27] has been used.Two significant factors contributing to the measurement outcome and uncertainty are always assumed in [13][14][15][16][17][18][19][20][21][22][23][24][25][26] to be present: finite resolution and Gaussian noise.In the majority of these publications, it is generally concluded that if the sample standard deviation s is sufficiently large in comparison to the resolution or quantization step size q of the measurement instrument (about 0.5 ), s q ≥ the effect of the finite resolution will be insignificant, and conventional statistical inference will be valid.Therefore these publications focus on measurement series with relatively small variance ( 0.5 ) s q ≤ without considering the dominant random process present.However, identification of the dominant noise type of the data series, especially for samples with relatively small variance where the requirements of QT I and QT II may be not met [12], should be of primary importance for uncertainty evaluation.Some methods presently used for the identification of particular random-noise processes are given in [2,5,[28][29][30][31].All these methods are firmly applicable only if a sufficiently large data set with over a hundred values is available, which was not the case for many of publications ([15-19,21,22]) treating finite resolution.As a kind of criterion of the applicability of conventional statistics, sometimes ( [24,32]) the ratio of the standard deviation s to the resolution q is used, but its use is not sufficiently justified.
In our experimental study the influence of the digital resolution on the statistical properties of the data sample is experimentally determined.The same mass difference was extensively measured under repeatable conditions.Different resolution settings of the automatic mass comparator from 0.001 mg up to 0.200 mg were applied, and at least 150 mass differences with every resolution setting were recorded.The digital resolution interval q was the single parameter intentionally changed during the experiment process.From 600 up to 1600 readings at uniform time intervals with every resolution setting were obtained; this is a large data set of 150 to 400 mass differences sufficient for our statistical analysis.Data sets with different statistical properties were recorded and, as recommended in [1,2,5], special methods of accounting for possible correlations were used for the statistical analysis.Our paper is focused on the effects that cannot be observed in routine practice when only relatively small data samples are available.Nevertheless, knowing possible threats due to digitization is helpful also for routine practice.

STATISTICAL TOOLS FOR AUTO-CORRELATED MEASUREMENTS
One of the central tasks in time-frequency metrology is frequency stability analysis in the time domain based on an array of data points i y with constant time intervals between successive measurements.The statistical noise characteristics of a frequency source are usually analysed only after eliminating factors like drift and environmental effects.Data sampling or measurement is carried out during time interval 0 , τ the analysis or averaging during time , τ which is usually a multiple of 0 .
where m is the averaging factor.More than ten different statistical variances have been used for frequency stability analysis in timefrequency metrology [3,4].Amongst the others, the most widely used are Allan variance and its later overlapping version which, for the same data set, can provide an extension to longer averaging times and better confidence than the original version.µ = − For the data set with frequency modulation (FM) the relation between α and µ is: An important application of the σ τ − plot is the determining of the flicker floor of the frequency signal or standard [3,33].This is a point where a white FM noise with 1 µ = − will turn to a flicker FM noise with 0.
µ = This point defines the principal stability and/or uncertainty limit achievable with this particular signal source.Determination of this point requires a lengthy measurement series, which may take several months, and also depends on the analytical method (type of variance) used for analysis.Continued averaging after the flicker floor is achieved will not further improve stability or uncertainty estimates of the signal.

Variances suitable for correlated data
The estimated variance for N independent random variables y following [1] is calculated as: where i y are the N values of the data set and y is their arithmetic average.
In the case of auto-correlated and therefore nonindependent random data, 2  s is not applicable because it is non-convergent but (3) shows that s depends on .
N This problem of adequate estimation will arise if the average value of measurements is not stationary.However, with the Allan variance this sample-dependent unstable behaviour is normally avoided.The Allan variance AVAR (called also the 2-sample variance) is calculated from the data set i y as follows [2]: where i y is the th i of M values averaged over the sampling interval, .
τ Stability is often expressed as the square root of variance, ( ), y σ τ the Allan deviation, ADEV.For white FM noise the Allan variance is the same as the variance 2 .s For more divergent noise types such as flicker noise, the Allan variance, as distinct from 2 , s converges to a value that is independent of the size of averaged samples.
At present, the most common choice in timefrequency metrology is the overlapping Allan variance defined as [3,4]: ) where according to (1) m is the averaging factor, τ the averaging time, and M is the full sample size.In comparison with the original Allan variance, the overlapping version has much better reproducibility and allows for larger averaging times τ to be used for the same data set of M observations.The original and overlapping versions of the Allan variance will give exactly the same result for the smallest time 0 τ during which a single value is measured, and for longer averaging times the expectations of both versions are the same.The overlapping Allan variance (or estimated sample variance) and respectively Allan or standard deviations are mostly used in our study.As an analogue of the quantity "frequency" commonly analysed in timefrequency metrology our study has the quantity "mass difference", see formula (12).

Power law noise identification
Knowing the power law noise type can considerably improve the planning of measurements; for example, deciding on the optimal averaging time, determining the uncertainty intervals, the equivalent number of degrees of freedom, and correcting for different biases.In the practice of time and frequency metrology, three methods have been used for power law noise identification: 1. Barnes 1 B bias function [2,3,34], which is the ratio of 2  s to the Allan variance; 2. Slope on the logarithmic 2 σ τ − plot [2,3]; 3. Use of the lag 1 autocorrelation coefficient [3,28] including differencing and recalculation of autocorrelation for the noises with more divergent data.The Barnes 1 B bias function shows the non-convergence of standard variance, and is defined as: r for the same data set is: For similar purposes, the Durbin-Watson statistic d is used [29][30][31] instead of the 1 B bias function in many other fields of statistics.Formula (8) gives the Durbin-Watson statistic d and its relation with 1 : where the vector z of the residuals from the regression of the time series i y is defined by ; ) is a least squares estimate for the value .
i y Method 2 for the identification of power law noise is based on the slope of the line fitted through the logarithmic plot of the Allan variance and averaging time.For this method, obviously at least two different averaging times are needed, and estimation is valid for all points used.For practical purposes a single point estimate of method 1 or 3 is preferable.
Method 3 makes use of the lag 1 auto-correlation coefficient calculated from: ) where i y are the successive results of the recorded data set.
According to the algorithm of Method 3, the exponent α can be calculated from the expression:

2(
), where d is the order of the differencing of data i y ( 0;1;2), d = and δ is defined as: δ < we assume the result of ( 10) to be valid and final.Otherwise, the lag 1 auto-correlation coefficient is recalculated from the first differences where for 0 α = the expanded uncertainty covering 94% is approximately from − 0.4 to + 0.7.For samples with 128 points the expanded uncertainty covering 99% will be from − 0.3 to + 0.5 [28].

MEASUREMENT PROCEDURE AND DATA SETS USED FOR ANALYSIS
In our study, the RTTR type measurement cycle is used for the high accuracy mass comparisons with an automatic digital Mettler AT106H comparator operated in an air conditioned room [35].The comparison cycle starts off with loading the reference weight , R and after stabilization time the first indication 1 I is recorded.Then test weight T is loaded and the second indication 2 I is recorded.After a short wait with an unloaded comparator the test weight T is loaded again, and the third indication 3 I is recorded.Finally, after loading the reference weight , R indication 4 I is recorded.Weights are automatically placed on the load receiver and removed from it between each measurement.The time interval between the sequent indications is 2.1 min.The full variability range of indications caused by comparator zero drift over one day is normally within ± 0.2 mg.A typical set of one-day indications and the mass differences calculated from the results is presented in Fig. 1.Air temperature [36], pressure, and relative humidity measured inside the comparator chamber concurrently with mass indications are also shown.The variability of air pressure seems to be the main cause for the zero drift of the comparator seen in Fig. 1.The influence of temperature variations can be noticed as well.
In order to eliminate comparator zero drift from each of the four sequent readings, the mass difference between the weights T and R is calculated as Normally the full range of differences is within ± 3 least significant digital units around the mean; the time required for determining one difference is 8.4 min.Thus, during a one-day experiment usually 150 differences were determined, see Table 2.Because the compared weights were of the same density, air buoyancy correction was not relevant.It is interesting to note that random walk noise was dominant for the initial readings of the comparator shown in Fig. 1.According to [3], calculating the first difference of a data set makes it less divergent, and the absolute values of the power law exponents will change by 2. That is exactly the case for mass differences i I ∆ calculated from initial readings by using (12), and characterized by the white noise.Indeed, ( 12) is rather close to the usual differentiating operation.
In the present study the mass differences of the same two 20 g weights were determined over a month (about 20 one-day series with 100) N > in repeatable conditions [35].A list of the samples used for the statistical analysis and some of their key parameters are given in Table 2. Time intervals between the successively measured samples lasted from some hours to some days.A sample with 400 differences was measured without any time gaps during more than two days.About half of the measurements were carried out using the highest digital resolution of comparator 0.001 mg; q = for the rest (below the line in Table 2) a different resolution setting was used for each of the one-day series.
According to [12], the probability density function (PDF) of quantized series x′ consists of a string of Dirac impulses located on a scale of q and weighted by the samples of the PDF of , x n + where n is "pseudoquantization noise" added to the initial signal .
x In column 3 of Table 2, for q resolutions from 0.001 mg to 0.005 mg digital distributions with bell-type envelope and with more than 4 different values are typical, and following [12] they likely fulfil at least the requirements of QT II.Starting from 0.01 mg, q ≥ the PDFs have only three different values with dominating side columns, which probably are less suitable for the evaluation of the shape or moments of an initial signal.

Statistical properties of mass differences
Data over ten days that contained about 1860 measured differences and were obtained using the highest possible resolution (0.001 mg) of the comparator were analysed together.Figure 2 shows logarithmic plots of 2  s and the Allan variance of mass differences versus averaging factor m according to (1).Here, factor 1 m = means that the variance is calculated from all successive values and corresponds to measurement time 0 , τ 2 m = means that the groups of two successive values are averaged and the variance calculated from them corresponds to time 0 2 , τ and 3 m = means that groups of three values with measurement time 0 3τ are used, and so on.Both variances have a similar dependence on averaging but different uncertainties (shown at the level of 2).k = The small uncertainty of the overlapping Allan variance calculated from (5) makes the curve in Fig. 2 suitable as a basis of reference for results measured with a larger digital step.The estimated sample variance 2  s is in good agreement with the Allan variance, but has much larger uncertainty, thus both imply that the white noise process is peculiar to the combined large data sample.The slope of this plot also specifies the random noise type as a function of m or .
τ For all 11 data sets measured with the highest resolution of 0.001 mg, the dominant power law noise process for 0 τ was identified: i.e. of 1 2 ( 12) 0.00029 mg.q ≅ Data obtained with 0.001 mg q = most likely fulfil the requirements for QT I and QT II [12].
The Allan variance of mass difference as a function of averaging factor m as depending on the digital resolution of the comparator is presented in Fig. 3.The effect of digital resolution is evident.For resolutions from 0.001 mg to 0.005 mg the Allan variance of the mass differences measured decreases with averaging time, as expected for white noise with 1.
µ ≈ − For resolutions from 0.01 mg to 0.2 mg, slopes 0, µ ≥ and by using averaging uncertainty cannot be firmly improved, at least for the first ten points in a measurement series.Some of the   figure show the contribution of the digital rounding effect due to corresponding digital resolution step q in respect to the measured Allan deviation of the mass differences.For a resolution of 0.001 mg, the Allan deviation of a single result will be reduced more than ten times if a series of 64 points is averaged.At the same time, for a resolution of 0.1 mg the Allan deviation will be reduced only 3.4 times, and averaging up to the first 16 points will not show any reduction.Therefore, a common Type B estimate of 0.029 mg for 0.1 mg q = is quite adequate and this estimate cannot be reliably further improved by means of averaging.
In Fig. 5 the experimental standard deviation and the Allan deviation are presented as functions of the digital resolution applied, including a number of values for 0.001 mg, q = see also Table 2.As stated in [13,23,24,32], resolution q will not affect the conventional statistical inference if 0.5 ; s q ≥ and conventional statistical inference cannot be applied if 0.3 s q ≤ [32].These limits are depicted as straight lines in Fig. 5.The major part of all experimental standard deviations determined over a month (see Table 2) fit in the range specified by those limits, and only two for the resolutions of 0.1 mg q = and 0.2 mg q = are smaller, implying that conventional statistical inference cannot be applied.At the same time, as confirmed by Figs 2 and 3 and Table 3, data sets with dominating white noise suitable for conventional analysis are obtained only for the three first points with resolutions from 0.001 q = mg to 0.005 mg; q = for data with resolutions from 0.01 mg to 0.2 mg apparently more advanced tools are required.The minimum value of the experimental standard deviation observed with 0.001 mg q = was 0.00029 mg s = and 0 ( ) 0.00030 mg.
Nevertheless, conventional analysis of these data will lead to a moderate overestimation of uncertainty, as the determined exponents ( 0.54 α = and 1.54) µ = − surpass the white noise values, and thus the dependence of the variance of the sample mean on the number N of the values in the sample is somewhat stronger than inversely proportional to .
N The ratio of standard deviation to resolution q is rather insensitive to the type of random process (Fig. 5).A significantly better choice for the distinction of data sets with zero auto-correlation is the Allan deviation, but still it is less reliable than identifying the type of power law noise.

Results of power law noise identification
In Method 1, the Barnes 1 B bias function is applied to data sets measured with different digital resolutions.The results are shown in Fig. 6.Comparing the actual value of 1 B with the values expected for the various noise types allows identification of the dominant noise type: see Table 3.It is evident that dominant random noise types depend on the resolution used and change from the white noise process for q ranging from 0.001 mg to 0.005 mg to the combined 1 f and Fig. 6.Random noise type identification with Method 1 as a function of digital resolution.slopes of the three largest values of q are more typical of the 1 -noise f process (FFM).For larger averaging factors the slopes of 2σ τ − curves tend to be more similar (see Fig. 3), which is consistent with an increasing influence of the comparator zero drift for the dominating noise in this case 1 .Nevertheless, for the first 20 differences shown in Fig. 3 the noise processes are clearly different, and for larger resolutions a predictable reduction of uncertainty with increasing averaging time cannot be assumed.
For Method 3, the intermediate calculation results are presented in Table 4.In the first step lag 1 correlation coefficients are calculated from mass differences obtained by using (9).For the results where 0.25 δ > a 1 Although before the calculation of s 2 or the Allan variance the zero drift has been eliminated, before signal quantization it is fully present and so certainly its amplitude has an influence on the outcome of quantization.
second step is required (see column 3 of Table 4).After differentiating the initial mass differences, the lag 1 correlation coefficients are calculated again.A third step is not needed as 0.25 δ < for all .q In Table 3, the exponent α is determined as a function of resolution q by using all three methods described in Section 2.2.As shown in [28], Method 3 has a better capability for noise identification than the other methods.According to this method, two types of random noise are evident in connection with digital resolution.Our measurements indicate that white noise (WFM) dominates for q in the range from 0.001 mg to 0.005 mg and random walk type noise (RWFM) is typical for q in the range from 0.01 to 0.2 mg.

CONCLUSIONS
We measured mass differences obtained from a highaccuracy comparator set to different digital resolutions to study the effect of resolution or digital quantization error on the statistical properties of measured data.For the resolution of 0.001 mg, q = the difference between the estimated sample variance and the Allan variance is insignificant (Fig. 2).This was expected because the statistical tools suitable for correlated random data would give similar results with conventional uncertainty measurement (GUM) practice [1] if the relevant for Type A estimation assumptions were fulfilled.Furthermore, the tools intended for random measurements that may be correlated can reveal features not considered in conventional analysis.For example, as shown in Fig. 2, due to a small negative autocorrelation the dependence of uncertainty on averaging factor or time is stronger ( 1.45) µ = − than for the pure white noise ( 1 ) µ = − often assumed in practice.However, after averaging a certain number of points, about 50 to 100, the curve tends to the flicker noise with 0.5, µ ≈ − and further averaging obviously will be less effective.
The data sample measured with 0.001 mg q = most likely fulfils the requirements of QT I and QT II [12], samples with 0.002 q = mg and 0.005 mg q = at least the requirements of QT II.For resolutions in the range 0.02-0.2mg significant differences between the sample variance estimated using (3) and the Allan variance calculated from (4) or (5) are evident (Fig. 6).This indicates that the assumptions needed for Type A estimation -randomness and independence of sequent observations -are not met and that the tools of Section 2 are more convenient but would require much larger data sets and more complicated calculations.Distinction between data sets suitable for conventional analysis and data sets requiring more advanced tools cannot be based on the ratio of standard deviation to resolution q as proposed previously [32].Power law noise identification or the Allan deviation are much more effective for that.
Nearly proportional dependence of a standard deviation on digital resolution q presented in Fig. 5 confirms that, among the factors causing variability of indications in a sample, the digital rounding is often dominating.This is true until the comparator would yield the same value during the measurements as in the case of 0.5 mg.q = The other components arising from internal (Gaussian) noise and from the zero drift of the comparator are substantially contributing to uncertainty for q = 0.001 0.005 mg.
− By using the Allan variance or the Allan deviation of measured mass differences as a function of averaging time, the relation between finite resolution and the type of random process of the sample became very evident (Figs 3 and 4).Setting q from 0.001 mg up to 0.01 mg, or even larger, the apparent power law noise exponent is switched from white with the slope of 1 µ ≈ − to the random walk type with 1 µ ≈ (see Figs 3 to 7, Tables 2  and 3).Concurrently the shape of the PDF of the sample changes from bell-type envelope to a distribution with strongly dominating side columns.
We think that for 0.001 mg q = the requirements for QT I are fulfilled, and the shape of the probability density function of quantized data is similar to that of the initial data.For 0.002 q = mg and 0.005 mg q = the requirements for QT II are fulfilled, and the moments of the probability density function of quantized data can be reliably used.Otherwise, the uncertainty of the mean value of the data set measured by using 0.02 mg q > cannot be effectively improved by means of averaging, and correlation between results must be accounted for.Our analysis confirms that the type of random process present will affect the estimation of measurement uncertainty.Knowing the noise behaviour will facilitate the planning of experiment: the optimal averaging or integrating time (useful number of repetitions) and the best resolution achievable with the particular measurement procedure can be estimated.
s τ is the estimated sample variance for points with averaging time τ and 2 ( ) y σ τ is the Allan variance for averaging time .τ An approximate relation of 1 B with lag 1 autocorrelation coefficient 1

Fig. 2 .
Fig. 2. Determination of the lowest limit for uncertainty from the σ 2 -τ plot.The Allan variance is shown with squares, s 2 with open circles, and both together with the expanded uncertainty interval; k = 2.The white noise 1/n dependence of variance is shown as a straight line.
valid for the first few points in the series.In Fig.2the slope for the central part of the curve is close to white noise ( 1.05), µ = − and starting from an averaging factor of 64 the curve turns from white to flicker noise and has a significantly smaller slope.Nevertheless, the actual flicker floor is not reached and, for samples with 100, N < averaging still reduces the uncertainty.In Fig.2the Allan deviation for a single observation 0 000027 mg, which is about ten times smaller than the assumed Type B contribution of the digital resolution 0.001 mg, q = curves given in Fig.3are modified to σ τ − in Fig.4.The straight lines in the

Fig. 3 .
Fig. 3. Allan variance of measured mass differences as a function of averaging factor m and digital resolution q.

Fig. 4 .
Fig. 4. Allan deviation as a function of averaging factor m and digital resolution q.Straight lines represent only the uncertainty of the digital rounding effect due to the corresponding q.

Fig. 5 .
Fig. 5. Standard deviation and the Allan deviation as a function of digital resolution q.Straight lines show the range from 0.3q to 0.5q of the respective resolution.

Table 1 .
Frequency and time domain exponents for some statistical power-law noises

Table 2 .
List of the samples used for the statistical analysis.PDF -probability density function

Table 3 .
Determination of the power law noise exponent as a function of resolution

Table 4 .
Random noise identification with Method 3: from lag 1 autocorrelation