Title: An outlier detection method for high dimensional data with mixed attributes
Authors: Kangmo Jung - Kunsan National University (Korea, South) [presenting]
Abstract: Outliers are extreme observations which are far away from other observations. Outlier detection becomes a significant procedure for many applications such as detecting insurance fraud or industrial damage. Most of the research work in outlier detection has focused on data sets having one type of attribute, that is, only continuous attributes or categorical attributes. Furthermore, in these days the data sets with many attributes tend to be sparse, and conventional methods using the Euclidean distance or nearest neighbors become inappropriate. We propose an outlier detection method using both quantiles of attribute value frequency score for categorical attributes and the Mahalanobis distance for continuous attributes. It also handles sparse high-dimensional continuous data and it is very fast, scalable. Experimental results show how the proposed method compares with other state-of-the art outlier detection methods proposed in the literature.