Open Issues Need Help
View All on GitHubAI Summary: This issue aims to develop a parser for AMR (Most Relevant Activities) documents to extract financial information, specifically current expenses and service acquisition details. The parser needs to identify table patterns, extract economic classifications and associated descriptions, and validate the extracted codes. The primary goal is to support operational cost analysis by ensuring high data quality, free from truncation or OCR noise.
AI Summary: This issue aims to set up a Python environment for extracting data from financial PDFs, supporting both native and scanned documents. It involves installing and testing `pdfplumber` and `tabula-py`, comparing their extraction quality for Portuguese tables, and developing a script to detect if a PDF is native or scanned to recommend the optimal library.
AI Summary: This GitHub issue describes the task of creating a parser to extract structured data from PPI (Plano Plurianual de Investimentos) PDFs. The objective is to identify project codes, designations, and their corresponding budget values for 2025 and future years. The extracted information will then be exported into CSV and JSON formats, with acceptance criteria focusing on high regex coverage and accurate numerical parsing.