Title: Novel standardization methods for preprocessing multivariate data used in predictive modeling
Authors: Emily Grisanti - Robert Bosch GmbH, TU Freiberg (Germany) [presenting]
Matthias Otto - TU Bergakademie Freiberg (Germany)
Abstract: An essential part of multivariate analysis is the preprocessing of the data set, since most of the predictive models assume data to be standardized. This is why the Standard Normal Variate (SNV) transformation, which subtracts the mean value and divides it by the standard deviation sample by sample, has become very popular. For highly correlated data which are often found in the context of analytical measurement, e.g. vibrational spectroscopy, performing the SNV over the full spectral range is in some cases not sufficient for removing unwanted effects, e.g. influence of stray light. Three different standardization methods are presented, that apply SNV to defined sequential windows rather than to the full spectrum: Dynamic Localized SNV (DLSNV), Peak SNV (PSNV) and Partial Peak SNV (PPSNV). DLSNV is an enhancement of the Localized SNV (LSNV), which allows a dynamic starting point of the localized windows on which the SNV is executed individually. Peak and Partial Peak SNV are based on the selection of ranges from the spectra that have a high correlation to the target value and perform SNV on these essential windows. The prediction errors of two regression models in chemical analytics are shown to be reduced by up to 16\% and 29\%, respectively, compared to LSNV.