CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Generalization of cell-wise outlier identification for probability density functions Authors:  Ivana Pavlu - Palacky University Olomouc (Czech Republic) [presenting]
Karel Hron - Palacky University Olomouc (Czech Republic)
Abstract: When performing any data analysis, it is important to acknowledge the possible existence of outliers in the dataset. Observations deviating from the model assumptions may severely affect the results and their interpretability, making outlier detection an important step of the analysis. With multivariate data, it can be convenient to observe the outliers at the cellwise level, this means looking for deviations in individual cells of a data matrix. One well-established possibility for a comprehensive outlier identification is the use of Deviating Data Cells (DDC) algorithm which enables the search for both row-wise and cellwise outliers. The idea of the DDC algorithm is applied to a spline representation of probability density functions (PDFs), hence extending multivariate outlier detection to the functional distributional case. Using the information contained in the spline coefficients, it is possible to highlight parts of PDFs where their behavior deviates from the common trend. Theoretical developments will be demonstrated with a dataset containing particle size distributions from a geological survey in the Czech Republic.