Title: Sparse modeling of risk factors in insurance analytics
Authors: Sander Devriendt - KU Leuven (Belgium) [presenting]
Katrien Antonio - University of Amsterdam and KU Leuven (Belgium)
Roel Verbelen - KU Leuven (Belgium)
Edward Frees - University of Wisconsin-Madison (United States)
Abstract: Insurance companies use predictive models for a variety of analytic tasks, including pricing, marketing campaigns, claims handling, fraud detection and reserving. Typically, these predictive models use a selection of continuous, ordinal, nominal and spatial predictors to differentiate risks. Such models have to be competitive, interpretable by stakeholders and easy to implement and maintain in a production environment. That is why current actuarial literature puts focus on GLMs where risk cells are constructed by binning predictors up front, using ad hoc techniques or professional expertise. Penalized regression is often used to encourage the selection and fusion of predictors in predictive modeling but most penalization strategies work only when all predictors are of the same type, such as LASSO for continuous variables and Fused LASSO for ordered variables. We design an estimation strategy for GLMs which includes variable selection and the binning of predictors through L1-type penalties. We consider the joint presence of different types of predictors with their respective penalties. Using the theory of proximal operators, our estimation procedure is computationally efficient since it splits the overall optimization problem into easier sub-problems per predictor and its penalty. We illustrate through simulations and a motor-insurance case-study that we are able to build a sparse regression model, in a statistically sound way, for data with different types of predictors.