FinBERT: A Large Language Model for Extracting Information from Financial Text

Metadata

Title: FinBERT: A Large Language Model for Extracting Information from Financial Text
Category: undergraduate
UUID: 8d52a4152c734b5abce5351a075b38b8

Source URL: https://bm.hkust.edu.hk/bizinsight/2026/01/finbert-large-language-model-extracti...
Parent URL: https://bm.hkust.edu.hk/bizinsight/fintech-and-ai-business
Crawl Time: 2026-03-24T05:24:29+00:00

# FinBERT: A Large Language Model for Extracting Information from Financial Text

**Source**: https://bm.hkust.edu.hk/bizinsight/2026/01/finbert-large-language-model-extracting-information-financial-text
**Parent**: https://bm.hkust.edu.hk/bizinsight/fintech-and-ai-business

[ [Fintech and AI in Business](https://bm.hkust.edu.hk/bizinsight/fintech-and-ai-business "Fintech and AI in Business") ]

FinBERT: A Large Language Model for Extracting Information from Financial Text

16 Jan 2026

[HUANG, Allen](https://bm.hkust.edu.hk/faculty/huang-allen)

Head, Department of Accounting, Professor

WANG, Hui

[YANG, Yi](https://bm.hkust.edu.hk/faculty/yang-yi)

Lee Heng Fellow, Associate Professor

[Read Full Paper](https://onlinelibrary.wiley.com/doi/10.1111/1911-3846.12832)

We develop FinBERT, a state-of-the-art large language model that adapts to the finance domain. We show that FinBERT incorporates finance knowledge and can better summarize contextual information in financial texts. Using a sample of researcher-labeled sentences from analyst reports, we document that FinBERT substantially outperforms the Loughran and McDonald dictionary and other machine learning algorithms, including naïve Bayes, support vector machine, random forest, convolutional neural network, and long short-term memory, in sentiment classification. Our results show that FinBERT excels in identifying the positive or negative sentiment of sentences that other algorithms mislabel as neutral, likely because it uses contextual information in financial text. We find that FinBERT's advantage over other algorithms, and Google's original bidirectional encoder representations from transformers model, is especially salient when the training sample size is small and in texts containing financial words not frequently used in general texts. FinBERT also outperforms other models in identifying discussions related to environment, social, and governance issues. Last, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 18% compared to FinBERT. Our results have implications for academic researchers, investment professionals, and financial market regulators.