This dataset, titled "Financial-QA-10k", contains 10,000 question-answer pairs derived from company financial reports, specifically the 10-K filings. The questions are designed to cover a wide range of topics relevant to financial analysis, company operations, and strategic insights, making it a valuable resource for researchers, data scientists, and finance professionals. Each entry includes the question, the corresponding answer, the context from which the answer is derived, the company's stock ticker, and the specific filing year. The dataset aims to facilitate the development and evaluation of natural language processing models in the financial domain.
About the Dataset
Dataset Structure:
- Rows: 7000
- Columns: 5
- question: The financial or operational question asked.
- answer: The specific answer to the question.
- context: The textual context extracted from the 10-K filing, providing additional information.
- ticker: The stock ticker symbol of the company.
- filing: The year of the 10-K filing from which the question and answer are derived.
Sample Data:
Question: What area did NVIDIA initially focus on before expanding into other markets?
Answer: NVIDIA initially focused on PC graphics.
Context: Since our original focus on PC graphics, we have expanded into various markets.
Ticker: NVDA
Filing: 2023_10K
Potential Uses:
Natural Language Processing (NLP): Develop and test NLP models for question answering, context understanding, and information retrieval.
Financial Analysis: Extract and analyze specific financial and operational insights from large volumes of textual data.
Educational Purposes: Serve as a training and testing resource for students and researchers in finance and data science.