Open Issues Need Help
View All on GitHubAI Summary: Enhance the web crawler's robots.txt handling to correctly interpret wildcard directives such as `Disallow: /*?sort=` and `Allow:` entries, improving adherence to website robots.txt rules.
Complexity:
4/5
help wanted
Async web crawler written in Python. Modular, lightweight and SQLite/MySQL ready.
Python
#async#async-crawler#aysncio#beatifulsoup4#crawler#data-mining#mysql#open-source#python#scraping#search-engine#seo-bot#sqlite#web-crawler