CMStatistics 2023: Start Registration
View Submission - CFE
A1497
Title: LongFinBERT: A Language Model for Very Long Financial Documents Authors:  Erik-Jan Senn - University of St. Gallen (Switzerland) [presenting]
Minh Tri Phan - University of St. Gallen (Switzerland)
Abstract: Language models (LMs) are successful for many natural language processing tasks. However, processing very long documents such as central bank statements, policy reports, and accounting statements is computationally expensive or practically infeasible using the standard self-attention mechanism. We introduce LongFinBERT, an LM in the financial context that can handle very long documents using the scalable self-attention architecture by Ding et al. (2023). We apply LongFinBERT to detect accounting misstatements using annual financial statements. Our findings are three-fold. First, textual features improve accounting misstatement detection. Second, the textual features of LongFinBERT outperform the textual features of benchmark text models. Third, processing the entire document is relevant to obtaining good textual features. LongFinBERT is publicly available at https://huggingface.co/minhtriphan/LongFinBERT-base.