Open Issues Need Help
View All on GitHub prometheus server always crashing about 2 months ago
AI Summary: The Prometheus server within a Kubernetes cluster utilizing the Grafana LGTM stack (including Loki, Tempo, Mimir, and Pyroscope) is repeatedly crashing, failing liveness and readiness probes due to connection refused and timeout errors. The task requires troubleshooting and resolving the Prometheus deployment issues, likely involving examination of Prometheus configuration, resource allocation, network connectivity, and potential conflicts with other services in the cluster.
Complexity:
4/5
good first issue
Observability: Grafana, Loki, Alloy, Promtail, Tempo, Prometheus, Pyroscope , LGTM Stack, logs, metrics and tracing
#alloy#grafana#kubernetes-monitoring#kubernetes-tracing#lgtm-stack#logging#loki#mimir#monitoring#observability#prometheus#promtail#tempo#tracing-applications#tracing-collector