B0769
Title: Can probability coherence correction be helpful in data lakes?
Authors: Andrea Capotorti - Universita degli Studi di Perugia (Italy) [presenting]
Marco Baioletti - Universita degli Studi di Perugia (Italy)
Abstract: Integration, merging, and fusion of uncertain data are more and more current in artificial intelligence since the Big Data era we are leaving in. With the increase of volume of information, the grouping, simplification, and partiality of the statistical analysis become a mandatory task. And with diffuse and duplicated data structures, like e.g. data lakes used for classifications and/or previsions, the problem of inconsistent conditional probabilities assessments comes to the fore. Probabilistic reasoning embedded in a coherent setting has the advantage of formalizing partial knowledge expressed through events, conditional or unconditional, representing different combinations of elementary situations, that can be hence thought of as macro" situations with non-trivial interconnections, implications, incompatibilities, etc., hence it is apt to merge different sources of information. Recently a procedure, based on L1 distance minimization and mixed-integer programming MIP, has been proposed to: correct straight unconditional assessments; revise the belief; solve the statistical matching problem; minimize the number modifications in line with the principle of optimal corrective explanation. Now, an unsupervised revision of incoherent conditional assessments is proposed by profiting from the so-called exploiting of zero probabilities and through shrewd use of slake variables.