Open Issues Need Help
View All on GitHubAI Summary: This issue aims to develop a parser for AMR (Atividades Mais Relevantes) data, specifically focusing on extracting current expenses and service acquisition information. The goal is to identify patterns in relevant tables, extract economic classifications and associated descriptions, and ensure the extracted data is consistent and legible.
AI Summary: This issue aims to set up a Python environment for extracting data from financial PDFs, supporting both native and scanned documents. Key tasks include installing and testing `pdfplumber` and `tabula-py`, comparing their table extraction quality for Portuguese content, and developing a script to differentiate between native and scanned PDFs.
AI Summary: This issue aims to develop a parser to extract structured data from investment plans (PPI) found in PDF documents. The parser should identify projects, their codes, designations, and budgetary values for specific years, exporting the results into CSV and JSON formats. Key tasks include creating regular expressions to capture project lines and extracting various data points.