Open Issues Need Help
View All on GitHubAI Summary: This issue outlines the creation of a PySpark ingestor to move data from RSS/HTTP sources into a `bronze.articles_raw` Delta table. It requires a Python script to fetch data and write newline-delimited JSON to a landing zone, followed by a Spark Structured Streaming job to read, parse, enforce schema, and write to the Delta table with checkpointing. A local `make spark-demo` target will facilitate testing by dropping a fixture and starting the job, with success verified by new data appearing in the bronze table.
NeuroNews is an advanced ETL pipeline designed to scrape, analyze, and visualize politics and technology news using AI-powered NLP, sentiment analysis, and knowledge graph-based insights. Built on AWS cloud infrastructure, it enables real-time event detection, entity linking, and customizable dashboards for deeper news intelligence.
AI Summary: This issue aims to establish a local development environment using Docker Compose, integrating Spark (master, worker, history server), MinIO for object storage, and Delta Lake for data tables. It also includes monitoring tools like Prometheus and Grafana, with specific configuration files and Make targets for easy management.
NeuroNews is an advanced ETL pipeline designed to scrape, analyze, and visualize politics and technology news using AI-powered NLP, sentiment analysis, and knowledge graph-based insights. Built on AWS cloud infrastructure, it enables real-time event detection, entity linking, and customizable dashboards for deeper news intelligence.