Baselight

Enhanced US-GAAP Financial Statement Data Set

Refined US-GAAP Data with Outlier Correction, Amendments, and Missing Figures

@kaggle.vadimvanak_step_2

About this Dataset

Enhanced US-GAAP Financial Statement Data Set

This dataset builds upon "Financial Statement Data Sets" by incorporating several key improvements to enhance the accuracy and usability of US-GAAP financial data from SEC filings of U.S. exchange-listed companies. Drawing on submissions from January 2009 onward, the enhanced dataset aims to provide analysts with a cleaner, more consistent dataset by addressing common challenges found in the original data.

Key Enhancements:

  1. Outlier Detection and Correction: Outliers in the original dataset have been systematically identified and corrected, providing more reliable financial figures.
  2. Amendment Adjustments: In cases where SEC rules allow amendment filings to only include delta figures, full figures from the original submissions have been carried over for consistency, facilitating more straightforward analysis.
  3. Missing Figure Estimation: Using calculation arcs from the US-GAAP taxonomy, missing financial figures have been computed where possible, ensuring greater completeness.
  4. Data Structuring: Financial figures that previously appeared as separate rows have been consolidated into single rows with new columns, offering a cleaner structure.

Scope:

  • Data Scope: The dataset is restricted to figures reported under US-GAAP standards, with the exception of EntityCommonStockSharesOutstanding and EntityPublicFloat.
  • Currency and Units: The dataset exclusively includes figures reported in USD or shares, ensuring uniformity and comparability. It excludes ratios and non-financial metrics to maintain focus on financial data.
  • Company Selection: The dataset is limited to companies with U.S. exchange tickers, providing a concentrated analysis of publicly traded firms within the United States.
  • Submission Types: The dataset only incorporates data from 10-Q, 10-K, 10-Q/A, and 10-K/A filings, ensuring consistency in the type of financial reports analyzed.

Dataset Features:

  • Refined Financial Data: Accurate and consistent figures by addressing reporting issues, corrections for outliers, and data consolidation.
  • Enhanced Usability: By handling amendment submissions and leveraging GAAP taxonomies, the dataset offers a more analysis-friendly structure.
  • Improved Completeness: Where original submissions had gaps in reporting, this dataset fills those gaps using calculated figures based on accounting principles.

The source code for data extraction is available here

Share link

Anyone who has the link will be able to view this.