Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

hacktoberfest hacktoberfest2025 langchain llama-index llm llm-evaluation llm-observability llmops open-source openai playground prompt-engineering
15 Open Issues Need Help Last updated: Nov 9, 2025

Open Issues Need Help

View All on GitHub

AI Summary: The issue proposes adding a new LLM-as-a-judge evaluation metric called "LLM Sycophancy" (SycEval), based on a provided research paper. This new metric needs to be integrated into both the Python SDK and the frontend UI for online evaluation, along with corresponding documentation updates, following the pattern of existing metrics like "Hallucination".

Complexity: 4/5
good first issue 💎 Bounty Python SDK Frontend $25

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Bug good first issue Frontend

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
good first issue improvement hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
Feature Request good first issue hacktoberfest

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering
good first issue 💎 Bounty $50 Python SDK Frontend

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering

AI Summary: Integrate Gretel datasets into Comet Opik. This involves creating a Python SDK-based solution (ideally a Jupyter Notebook) to import Gretel datasets and upload them as Opik datasets, handling authentication and any necessary data conversion. The solution should be demonstrable in the Opik UI.

Complexity: 4/5
Feature Request good first issue python 💎 Bounty $50

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering

AI Summary: Develop a Comet Opik integration to import datasets from Hugging Face Datasets. This involves creating a Python SDK function and a Jupyter Notebook example (cookbook) demonstrating the import and upload of a Hugging Face dataset to Comet Opik's dataset management system. The solution should handle dataset conversion and authentication as needed.

Complexity: 4/5
Feature Request good first issue python 💎 Bounty $50

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering

AI Summary: Implement a new 'Trajectory Accuracy' evaluation metric for Opik, an open-source LLM evaluation platform. This metric should assess the accuracy of action sequences in ReAct-style agents, ideally using an LLM-as-a-judge approach. The implementation should include additions to the frontend UI (Online Evaluation tab), Python SDK, and documentation, along with a demonstration video.

Complexity: 4/5
documentation Feature Request good first issue python 💎 Bounty $50 Frontend

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python
#hacktoberfest#hacktoberfest2025#langchain#llama-index#llm#llm-evaluation#llm-observability#llmops#open-source#openai#playground#prompt-engineering