CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Unveiling patterns in spectroscopy data via a Bayesian latent variables approach Authors:  Alessandro Casa - Free University of Bozen-Bolzano (Italy) [presenting]
Tom O Callaghan - University College Cork (Ireland)
Thomas Brendan Murphy - University College Dublin (Ireland)
Abstract: Infrared spectroscopy techniques represent a convenient and non-disruptive way to collect vast amounts of data rapidly. Nowadays, these data are effectively used in a plethora of different fields, such as medicine, astronomy and food science. Nonetheless, from a statistical viewpoint, they introduce some relevant challenges mainly concerning their high dimensionality and the complex relationships among spectral variables (wavelengths), often due to convoluted chemical processes. In this framework, factor analysis represents a sensible strategy, as it aims to produce parsimonious representations of the data while focusing on the correlation structures. Nonetheless, its standard application does not account for redundancies in the features. Therefore, a modification of factor analysis is proposed, which maps the data into a lower dimensional latent space while simultaneously clustering the variables. A flexible Bayesian estimation procedure is then considered to fit the model. On the one hand, this approach results in an even more parsimonious summary of the data, highlighting which wavelengths carry similar information. On the other hand, from an interpretative point of view, the obtained partition produces useful insights from a chemical standpoint. The method is applied to milk mid-infrared spectroscopy data from cows on different feeding regimens, providing a useful tool to guarantee milk authenticity.