Open Issues Need Help
View All on GitHubThis repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.
AI Summary: The extraction script fails to process long arXiv papers because their content exceeds the AI model's context window, leading to `BadRequestError` messages. This results in valuable papers being skipped and noisy logs without a clear strategy for handling these oversized inputs. The issue proposes discussing potential solutions such as explicitly skipping these documents with clearer logging or implementing a chunking mechanism to process them in smaller sections.
This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.
AI Summary: The `Extractor` class currently sends sequential, blocking requests to the OpenAI API, leading to inefficiency for large datasets. This issue proposes implementing batching or parallelization to significantly speed up processing, better utilize API quotas, and reduce costs, by allowing multiple concurrent requests.
This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.