This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.

3 Open Issues Need Help Last updated: Sep 12, 2025

Open Issues Need Help

View All on GitHub
bug help wanted

This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.

Python

AI Summary: The extraction script fails to process long arXiv papers because their content exceeds the AI model's context window, leading to `BadRequestError` messages. This results in valuable papers being skipped and noisy logs without a clear strategy for handling these oversized inputs. The issue proposes discussing potential solutions such as explicitly skipping these documents with clearer logging or implementing a chunking mechanism to process them in smaller sections.

Complexity: 3/5
bug enhancement help wanted good first issue question

This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.

Python

AI Summary: The `Extractor` class currently sends sequential, blocking requests to the OpenAI API, leading to inefficiency for large datasets. This issue proposes implementing batching or parallelization to significantly speed up processing, better utilize API quotas, and reduce costs, by allowing multiple concurrent requests.

Complexity: 4/5
enhancement help wanted question

This repository contains all outcomes created in the 2025 Scientific Literature Knowledge Extraction Tool project hosted on the Eleuther AI Discord.

Python