Multivariate models based on infrared spectra as a substitute for oil property correlations to predict thermodynamic properties: evaluated on the basis of the narrow-boiling fractions of Kukersite retort oil

. This article investigates a potential for using models based on infrared spectra to predict basic thermodynamic properties of narrow boiling range oil fractions or pseudocomponents. The work took advantage of the simultaneous availability of a property database of narrow boiling range fractions of Kukersite oil shale retort oil (from the industrial retorting process) together with infrared spectra of these fractions. The work was based on the hypothesis that the models based on infrared spectra could potentially be used to reduce experimental data when developing other predictive methods, or even as a substitute for other prediction methods. In this study four basic oil properties, which are often used to predict other thermodynamic properties, were predicted from infrared spectra using support vector regression. These were specific gravity, refractive index parameter, average boiling point and molecular weight. According to bulk property prediction approach these selected properties can be grouped into energy parameters (two former) and size parameters (two latter). It was found that, for distillation fractions with varying compositions, both the energy parameters (specific gravity, refractive index) as well as the size parameters (molecular weight, average boiling point) can be predicted from Fourier transform infrared (FTIR) spectra, and that the accuracy of the predictions based on infrared spectra was comparable with the accuracies of petroleum bulk property correlations. Thus, infrared spectra can provide a convenient alternative in the thermodynamic property prediction field because they can be easily measured and correlated to a wide variety of properties.


Introduction
The thermodynamic and physical properties of oils are required so that chemical-physical processes can be designed or evaluated, both in terms of the plant and in the environment [1][2][3].For pure compounds and simple mixtures, for which the complete composition can be known, the properties for each compound can be specifically designated.However, for complex undefined mixtures of unknown composition, such as oils from various resources like petroleum, biomass, coal, or oil shale, simplifications are required to various degrees.For these materials empirical correlations which are based on bulk properties are historically applied [4][5][6].These correlations are usually based on commonly measured characteristic properties of the oil's narrow boiling range fractions (distillation cuts), which are referred to as pseudocomponents, such as their specific gravity, viscosity, molecular weight, or average boiling points from distillation curves.The specific property of a pseudocomponent can be predicted, depending upon system complexity, from one or more other properties using suitable regression equations.For conventional oil cuts with average boiling points below 350 °C regression equations with following general forms are often proposed [4]: , . ( Equations ( 1) and (2) contain the property θ that is to be predicted, with θ 1 and θ 2 being the two input parameters (or properties from which the property θ is to be predicted), and a to f are empirically derived regression constants.For heavier or more polar substances these two parameter equations may not be suitable [4].Therefore those correlations which supply accurate results for a range of oils (for oil cuts with boiling points of up to 350 °C), and which are used in process simulators, are usually based on at least two input parameters: preferably one describing molecular size (such as carbon number, molecular weight, average boiling point), and the other describing molecular energy (such as specific gravity, refractive index, hydrogen-carbon ratio) [4].While the deviation of properties into molecular size and molecular energy parameters is quite tentative, the approach still serves to emphasise the fact that molecules which are of a similar size (described, for example, by properties such as carbon number, molecular weight, or average boiling point) can involve components of various structure classes (such as, for example, in the case of conventional petroleum, a grouping into paraffins, naphthenes, or aromatics is often used).Therefore a variation exists in terms of property values.To be able to develop these bulk property correlations a large amount of experimental data are needed; however, experimental measurements are often time consuming and expensive.
The thermodynamic and physical properties of oils are required so processes can be designed or evaluated, both in terms of the plant and i For pure compounds and simple mixtures, for which the complete comp properties for each compound can be specifically designated.Howeve mixtures of unknown composition, such as oils from various resources coal, or oil shale, simplifications are required to various degrees.For correlations which are based on bulk properties are historically applied are usually based on commonly measured characteristic properties of the fractions (distillation cuts), which are referred to as pseudocomponen gravity, viscosity, molecular weight, or average boiling points from specific property of a pseudocomponent can be predicted, depending from one or more other properties using suitable regression equations.with average boiling points below 350 °C regression equations with fol often proposed [4]: Equations ( 1) and (2) contain the property θ that is to be predicted, two input parameters (or properties from which the property θ is to be empirically derived regression constants.For heavier or more polar substa equations may not be suitable [4].Therefore those correlations which sup range of oils (for oil cuts with boiling points of up to 350 °C), and w simulators, are usually based on at least two input parameters: preferably size (such as carbon number, molecular weight, average boiling point), molecular energy (such as specific gravity, refractive index ,hydrogenthe deviation of properties into molecular size and molecular energy par the approach still serves to emphasise the fact that molecules which are of for example, by properties such as carbon number, molecular weight, or a involve components of various structure classes (such as, for example, in petroleum, a grouping into paraffins, naphthenes, or aromatics is often us exists in terms of property values.To be able to develop these bulk pro amount of experimental data are needed; however, experimental meas consuming and expensive.
Our own laboratory's work with Kukersite oil shale retort oil wa investigate infrared-based prediction methods.The Kukersite oil shale ret oil which is produced from Estonian Kukersite oil shale by pyrolysis or The thermodynamic and physical properties of oils are processes can be designed or evaluated, both in terms of the For pure compounds and simple mixtures, for which the com properties for each compound can be specifically designate mixtures of unknown composition, such as oils from variou coal, or oil shale, simplifications are required to various de correlations which are based on bulk properties are historica are usually based on commonly measured characteristic prope fractions (distillation cuts), which are referred to as pseud gravity, viscosity, molecular weight, or average boiling p specific property of a pseudocomponent can be predicted, from one or more other properties using suitable regression with average boiling points below 350 °C regression equatio often proposed [4]: Equations ( 1) and (2) contain the property θ that is to b two input parameters (or properties from which the propert empirically derived regression constants.For heavier or more equations may not be suitable [4].Therefore those correlatio range of oils (for oil cuts with boiling points of up to 350 simulators, are usually based on at least two input parameters size (such as carbon number, molecular weight, average bo molecular energy (such as specific gravity, refractive index the deviation of properties into molecular size and molecular the approach still serves to emphasise the fact that molecules for example, by properties such as carbon number, molecular involve components of various structure classes (such as, for petroleum, a grouping into paraffins, naphthenes, or aromatic exists in terms of property values.To be able to develop the amount of experimental data are needed; however, experim consuming and expensive.
Our own laboratory's work with Kukersite oil shale re investigate infrared-based prediction methods.The Kukersite oil which is produced from Estonian Kukersite oil shale by Our own laboratory's work with Kukersite oil shale retort oil was what initially led us to investigate infrared-based prediction methods.The Kukersite oil shale retort oil is a synthetic crude oil which is produced from Estonian Kukersite oil shale by pyrolysis or retorting [7,8].As with many alternative liquid fuel sources, shale oils manufactured from different sources have compositions that are more or less different from those of most conventional petroleum crudes [9].As an example, the shale oil that is manufactured via retorting from Kukersite oil shale has a high content of oxygen-containing compounds, with the largest portion being phenolic compounds [10,11].For this reason physical/thermodynamic property correlations, which have been developed based on petroleum fuels, may give worse results for Kukersite shale oil than would be required in applications.
In the process of finding approaches that provide the desired accuracy, we began investigating the potential for using correlations that are based on infrared spectra to support the development of bulk property prediction methods for the thermodynamic and/or physicochemical properties of oil cuts.The initial practical idea was to use the Fourier transform infrared (FTIR) method to measure and/or predict structural characteristics (especially the amount/concentration of phenolic OH groups [11]); however, it was later seen as a convenient tool for detecting random experimental measurement errors (identifying outliers) for all measured properties or to help reduce the amount of experimental data that would otherwise be needed to develop predictive bulk property correlations.The current paper is the third on this topic to have been issued from our laboratory.Application options in regard to the FTIR-based multilinear regression approach in order to be able to determine structure characteristics (such as hydroxyl concentrations in narrow boiling shale oil cuts [11]), and to determine temperature-dependent properties with linear temperature dependence (such as the density temperature dependence of narrow boiling shale oil cuts [12]), have been presented in earlier articles that have been published by this laboratory.Although the use of FTIR together with multilinear regression is a common tool for the property evaluation of various materials [13][14][15], using this approach as a thermodynamic property prediction tool -the current area of interest -has never previously been emphasised to our knowledge.The most likely reason is the unavailability of a suitable database which simultaneously involves quantitative information on thermodynamic properties and FTIR spectra for oil distillation cuts (narrow boiling range fractions).There is also an additional restriction that could have reduced interest in the wider scientific community when it came to predicting thermodynamic properties that can be obtained from FTIR spectra.This means that some form of standardisation or calibration transfer is needed to be able to use correlations on another spectrometer and, therefore, permit them to be used by other teams.
In this paper we take experimental property data for over two hundred Kukersite shale oil fractions, together with their FTIR spectra, and investigate the use of FTIR-based models to predict the basic temperature-independent thermodynamic properties, with an emphasis on predicting so-called 'size parameters'.In this article we focus on four basic properties that are commonly used in characterising oils from the point of view of thermodynamic property prediction: the specific gravity, the refractive index parameter (which is calculable from the refractive index [16]), the average boiling point, and the average molecular weight.Although the specific gravity and the refractive index are temperature-dependent properties, they are often measured at a single standard temperature and are used as a characteristic parameter.In this sense, these properties at a specified temperature are temperature-independent properties.As infrared spectra contain information about the molecular structure of the sample and do not directly contain information about the size of the molecules in the sample, the current work was driven by our initial interest to evaluate whether at all, or how well, FTIR-based models can predict so-called 'molecular size' parameters such as molecular weight and average boiling point.Application for FTIR-based models when it comes to density and refractive index as properties of fuels (here grouped into energy parameters) can be found in the available literature [12].

Sample preparation
The oil shale retort oils used for this study were obtained from Eesti Energia's Narva Oil Plant (Narva, Estonia).This plant uses the solid heat carrier retorting method (called the Galoter process) [17,18].Some additional information on the Kukersite oil shale, the processes occurring during pyrolysis and characteristics of the resulting oil can be found from the literature [10,11,[19][20][21][22][23][24][25].At the plant oil is separated into wide technical fractions as a product (currently typically into shale gasoline, fuel oil, and heavy oil).Mainly gasoline and fuel oil samples (technical fractions) from the plant were used for this study.The wide technical fractions from the plant were further separated into narrow boiling fractions via distillation, either by simple distillation or rectification, at our laboratory.However, most distillations were simple batch distillations that were carried out either at atmospheric pressure (using an Engler distillation [26]), and/or in a vacuum.Additional information on experimental settings and procedures can be found in: rectification [22,27] and simple distillation [23,28] of gasoline; rectification [27,29] and simple distillation [28] of fuel oil.
To increase diversity, wide technical fractions were obtained from different plants that use different oil shale processing regimes and were taken at multiple times over the course of three years.Additionally, to map/screen trends, some fuel oil technical fractions (or their distillation cuts) were artificially adjusted via extraction and/or mixing.For this purpose, the samples were separated into phenolic and neutral fractions using extraction with a 10% NaOH solution [10,11].In this manner additional fuel oil samples were created that had lower and higher contents of phenolic compounds than the original samples themselves (with hydroxyl contents ranging from about zero to 10 wt% OH).
The number of samples, mostly narrow boiling range fractions (or cuts) that were used in this FTIR-based models study, amounted to 355 for specific gravity, 327 for refractive index parameter, 229 for average boiling point, and 277 for number average molecular weight.It should be noted that although the property data that was used in this study covers a wide range of property values (such as boiling points between 350-670 K or 80-400 °C; a refractive index parameter between 0.34-0.45;specific gravity between 0.7-1.10;and a molecular weight between 70-450 g/mol), not all data would be reliable for the development bulk property correlations of desired accuracy.On one side, this is due to the observation that, during sample preparation to the narrow boiling range cuts, in the case of higher boiling fractions the applied temperature-time history of distillation could have resulted in a thermal decomposition-based chemical alteration (i.e.resulting in systematic anomalies).In addition, on the other side, artificially adjusted samples may not be the best choice for developing reliable bulk property correlations as the artificial adjustment of the oil's nature could have resulted in unreliable changes taking place (i.e.causing some systematic anomalies).However, in order to increase diversity, we have included in database for the development of FTIR-based models the properties of the aforementioned fractions of somewhat questionable representative quality.Although not all of the data can be used for developing or evaluating bulk property correlations, they are still valuable for the purposes of this study -to evaluate the potential of applying FTIR-based methods.

Property measurements
The methods and devices that have been used for measuring the properties (density, refractive index, average boiling point, average molecular weight), together with estimated standard uncertainties for the purpose of this study, are summarised in Table 1 and given in more detail below.

Density
The density at 20 °C was measured using an oscillating tube density meter (DMA 5000 M, Anton Paar GmbH) equipped with a heating attachment that heats the sample at the unit's inlet to lower the viscosity.The performance of the device was checked using distilled water and air.Based on repeat measurements of selected narrow boiling range fractions the standard uncertainty was estimated to be roughly 0.00015 g/cm 3 .The uncertainties for the heavy samples may be slightly greater than those of the lighter fractions.For heavier samples, due to their higher viscosity, several densities were measured at higher temperatures, and a density at 20 °C was then calculated from the linear temperature dependence [12].

Refractive index parameter
The refractive index at 20 °C was measured at 589.592 nm using an Abbemat HT refractometer (Anton Paar GmbH).Performance was checked before and after each set of measurements and was carried out using distilled water.From repeat measurements of selected narrow boiling range fractions the standard uncertainty was estimated to be 0.0011 (with an expanded uncertainty of 0.0021 at a level of 95%).The refractive index parameter was calculated from the refractive index at 20 °C using the equation given by Huang [16]: (3) where I is the refractive index parameter and n is the refractive index at 20 °C.

Average boiling point
The average boiling points for the samples were measured by means of a thermogravimetric analyser based method [27][28][29].The accuracy of this method was evaluated using measured oil narrow boiling range fractions that had been obtained by Rannaveski et al. [27] according to the ASTM D2892 standard.Based on these fractions, the standard method uncertainty of 2.1 °C (with an expanded measurement uncertainty of 4.3 °C at 95%) is used here [27].
, where I is the refractive index parameter and n is the refractiv

Average boiling point
The average boiling points for the samples were measured analyser based method [27][28][29].The accuracy of this method narrow boiling range fractions that had been obtained by Ran ASTM D2892 standard.Based on these fractions, the standard an expanded measurement uncertainty of 4.3 °C at 95%) is us 2.2.4.Molecular weight

Molecular weight
The average molecular weight was measured using mainly two different methods: cryoscopy (built in-house, as in the ASTM D2224 standard) and vapour pressure osmometry (Osmomat 070, Gonotec GmbH or later in the project Knauer K-7000, Knauer GmbH).Benzene was used as a solvent for the cryoscopy and was also used mainly in osmometric measurements.For both methods calibration was carried out using solutions of benzyl with known concentrations.Standard uncertainties were calculated based both upon the accuracy of the calibration and tests with pure compounds.The relative expanded uncertainty (at the 95% level) was determined to be between ±6 and ±7%, for a single method (device).As for fractions, the uncertainty was smaller for fractions with lower molecular weights and larger for heavier fractions.When taken on average, the absolute standard uncertainty of 7 g/mol (an absolute expanded uncertainty of 14 g/mol at the level of 95%) can be used here.

Infrared spectral measurements
Infrared spectra were measured using a Fourier transform infrared spectrometer that was fitted with an attenuated total reflection (ATR) measurement accessory.A single reflection ZnSe crystal was used.The spectrometer was an Interspec 301-X portable mid-infrared spectrometer (Interspectrum OÜ).The spectra were measured over the range of 700 to 4000 cm -1 at a resolution of 1 cm -1 .A cosine apodisation was used (cos(0.5•π•x)•(cos(0.5•π•x)) 2 ).Ten scans were taken and averaged together to produce the spectrum.Baseline correction was carried out by fitting a third order polynomial to regions in which shale oil does not absorb (2000-2200 and 3700-4000 cm -1 ).

Multivariate regression
Regression was carried out using support vector regression, which was implemented in Python (version 2.7) using the Scikit-learn package (version 0.15) [30].A mixed kernel was used, which combined the polynomial and radial basis function kernels using a single weighting parameter [31].The regression parameters were optimised by minimising the five-fold cross validation error using the SciPy differential evolution solver [32,33].
To make spectra more comparable to those from different instruments, with the hope of creating models that could be used on a wider range of instruments, the spectra were pre-processed.First, the spectra were transformed to remove the wavelength dependence that is inherent in ATR spectra, and therefore to make them more like a transmission spectrum.This was done using the algorithm that was presented by Bertie et al. [34] and Bertie and Lan [35].Then, because the first half of the spectra contained most of the chemical information, the region above 1800 cm -1 was removed.After this the standard part of the cross validation set. Figure 1 indicates that while there is quite a random fluctuation of residuals of the predicted values (quite a symmetrical distribution with no clear trends), the fluctuations pattern varies somewhat between those properties that were investigated.As the same infrared spectra were used for all of the property models, then it is reasonable to expect that the effect of inaccuracies in the infrared spectra likely had a similar impact on the residuals of predicted values across the different properties.Therefore the experimental property accuracy (measurement accuracy) and the strength of the correlation between spectra and the property are the most likely factors that could lead to any differences.Figure 1 also shows that several samples have quite large residuals (points that are much farther from other points).The majority of these outliers were more of the property specific type, but a minority of the outliers had large residuals across all of the four properties, which suggests that the sample preparation may have resulted in fractions with less common chemical compositions.The residuals for average boiling point provide a good example of the latter: there are two samples with boiling points of about 600 K that have residuals of more than 30 K.These two samples, and about five or six others, had consistently large residuals across all of the four properties.
More details about the models are given in Table 2, including error statistics such as root mean squared errors, average absolute deviations, and relative mean deviation for each property.Table 2 reveals that somewhat better predictions are obtained for the specific gravity and the refractive index parameter than for the average boiling point and the average molecular mass.This makes sense because the specific gravity and the refractive index parameter are quantitatively more closely related to the types of bonds (functional groups) in the mixture (which is the information that an infrared spectrum gives).At the same time, the measurement-related standard uncertainties in Table 1 indicate that the experimental data for these properties were more accurate than was the experimental data for average boiling point and molecular weight.The uncertainty ratios for predicted/measured values (predicted as RMSE and measured as standard uncertainty) for these four properties were as follows: 31 for specific gravity, 2.9 for refractive index parameter, 3.3 for average boiling point, and 1.7 for molecular weight.The comparison of uncertainty ratios for the predicted values and measured values, especially those of the refractive index parameter and the average boiling point, indicate that model accuracy was not only limited by the measurement accuracy of the experimental data, but was also dependent upon property, more or less, by factors that served to influence correlations between infrared spectra and properties.For example, the measurement method for density had a very low level of uncertainty, and the high ratio here suggests that accuracy was not limited by the accuracy of the experimental data, but instead by other factors.
As infrared spectra contain information about the molecular structure of the sample and do not directly contain information about the size of molecules in the sample, the current work was driven by our initial interest in evaluating how well, if at all, FTIR-based models can predict 'molecular size' parameters for narrow boiling range oil fractions (or pseudocomponents) that are prepared by distillation.As can be seen from Figure 1 and Table 2, when it comes to the distillation fractions, FTIR models can reliably predict 'molecular size parameters' (parameters that are more strongly related to the size of the molecules rather than to the types of bonds or functional groups in the mixture).Moreover, their values are quite accurately predicted.Therefore, in order to be able to accurately predict molecular size-related properties, as seen in this work, there should exist some form of indirect relation between these properties and infrared spectra.In this regard, it was observed that there are systematic changes between the infrared spectra [11,36] in the collected series of fractions with narrow boiling ranges (i.e.occurring from the first fraction to the last fraction collected).These systematic changes that accompany changes in the boiling point of the samples are likely to be what supplies the additional information that is necessary for predicting molecular size properties.
The performance of the models from a practical point of view can also be checked by viewing the results for sequential fractions from a single simple batch distillation.For most of the fractions the difference between model and measured values is smaller than the difference between subsequent fractions.That is, the infrared models, both for 'energy parameters' and 'size parameters', can generally distinguish between two fractions.This is illustrated in Figure 2 where specific gravity (as an energy parameter) and average boiling point (as a size parameter) are evaluated for a selected distillation.However, Figure 2 shows one additional performance-related indication between 'energy parameters' and 'size parameters' (a tendency that was generally more or less observable).In this exemplary distillation, at the point at which the distillation pressure was reduced from atmospheric pressure to low pressure, there is an inflection point in the overall trend (the drop in property values).It can be seen in Figure 2 that the average boiling point model had larger errors for these two samples at the inflection point, but the density model could better account for this anomaly.This makes sense when supporting the view that the average boiling point prediction from FTIR spectra could for the most part rely on systematic changes in FTIR spectra from fraction to fraction (something which could be related to changes in the types and concentration of bonds in the sample), but specific gravity predictions from FTIR spectra could rely both on systematic changes from spectra to spectra and the structure-related information of the specific spectra.Therefore, the correlation between a property and spectra could be more direct for some properties than others, but the spectra of distillation cuts can both directly and indirectly contain significant amounts of information which will help to predict the property.
Finally, whether or not the accuracy of the FTIR-based models (brought out in Table 2) is sufficient depends upon the desired application, and for calculations with a small tolerance for error it may still be preferable to measure the value experimentally.However, based on values for AAD and %AAD, the accuracies are at the same level as those that have been stated for petroleum correlations (bulk property correlations useable in process simulators) [4].It is to note here that the accuracy of the current FTIR study and those petroleum bulk property correlations that are currently available are not directly comparable in terms of the number of data points, the variability of samples, and the chemical nature of samples being used for correlations.Still, for indicative purposes alone, the fact can be highlighted here that, for a specific gravity, the current FTIR model had an AAD that was about half that of the bulk property correlations for conventional oils.However, at the same time, some of the petroleum bulk correlations for molecular weight had better levels of accuracy than did the current FTIR model, but these correlations were for a narrower range of molecular weights.At the same time, the molecular weight correlations that included the heavier fractions (as the regression that was based on FTIR spectra did) had significantly higher AADs.In addition, in a literature review of the use of multivariate regression for fuel property prediction [4], we found that other researchers have also created models for some of these parameters for other fuels and have achieved similar or better levels of accuracy.So it appears that in specific cases multivariate models have been shown to have levels of accuracy that are as good (or even better) than corresponding petroleum correlations that are based on physical properties.Finally, it should be noted that this article only looked at the implementation of the FTIR based models, however, data fusion from different analytical sources, such as FTIR and NMR, has shown an improvement in the ability to estimate some physico-chemical properties of crude oils when compared to a single analytical technique [37].

Conclusions
In this study an investigation was carried out into the potential for using predictive models based on infrared spectra to predict four basic oil parameters (specific gravity, refractive index parameter, average boiling point, and average molecular weight) in terms of narrow boiling range distillation fractions (or pseudocomponents).It was found that, for batch distillation fractions from Kukersite oil shale oil of varying compositions, in a way that is similar to energy parameters (parameters that are more closely related to the types of bonds and functional groups, with these here being the specific gravity and the refractive index parameter) the size parameters (parameters that are more closely related to the size of the molecules, with these here being the molecular weight and the average boiling point) can be reliably predicted from FTIR spectra.Therefore the spectra can both directly and indirectly contain significant amounts of information which will help to predict the property.However, it was also seen that the models gave somewhat better results for physical parameters that were related more to the molecular structure than for those that were more closely related to the molecular size.For generalization, FTIR-based models could be a useful "tool" for both determining/predicting various thermodynamic properties and for detecting random experimental errors (identifying outliers) of the measured data during a measurement project (if the FTIR analysis is included in the project).
Although not directly comparable, the comparison with performance parameters of the more commonly used bulk property correlations (which have been developed for conventional oils) suggests that predictive models that are based on infrared spectra could be used to reduce the experimental data required when developing other predictive methods, or even to use as a substitute for other prediction methods.Although issues do exist such as, for example, over-fitting the data, the problematic transfer of correlations to another spectrometer, or the questionable application to samples that are not included in the calibration set, thermodynamic and property correlations which are based on infrared spectra would still be advantageous to current prediction methods in some situations and applications.

Fig. 2 .
Fig. 2. Comparison of the performance of the average boiling point model and the specific gravity model for fractions from a typical distillation.For the average boiling point, the experimental error bars are data point sizes.For the specific gravity, experimental error bars are smaller than the data point.