The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)

business data-mining edgar edgar-crawler finance natural-language-processing nlp python sec web-crawler
1 Open Issue Need Help Last updated: Jul 25, 2025

Open Issues Need Help

View All on GitHub

AI Summary: Enhance the EDGAR-CRAWLER tool to better extract financial statements from 10-K reports. Specifically, the tool needs to identify and extract financial statements located outside of the standard Item 8 or Item 16 sections, often appearing after an "INDEX TO * STATEMENTS" header near the end of the document. A new "financial_statement" item should be added to the JSON output to contain this extracted information.

Complexity: 4/5
good first issue

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)

Python
#business#data-mining#edgar#edgar-crawler#finance#natural-language-processing#nlp#python#sec#web-crawler