Whilst there is a ton of useful data on the EPA Ireland web-site, it’s not exactly easy to track what’s going on.
Tool built by https://twitter.com/conoro to scrape the thousands of individual RSS feeds and generate what is hopefully helpful to those of you who wish to monitor submissions on the site.
The code is all up on GitHub here. In summary what it does is:
- Once a day around 1.30am GMT, it scrapes all the feeds on the EPA site
- It saves all the data into a SQLite database and uploads that to Amazon S3
- It updates a single small RSS feed with all the submissions from the previous day
- It generates a new CSV file with the same data as the RSS feed and saves that to GitHub
Subscribing to the RSS Feed
Use this URL in Feedly or similar: https://raw.githubusercontent.com/conoro/epa-rss/main/output/daily.xml
Viewing the daily CSV files.
They are all here in the repo starting on Sep 22nd 2022: https://github.com/conoro/epa-rss/tree/main/output/csv/daily
Getting notified by email (experimental)
If you’d like to receive email with a link to the latest CSV each day:
- Create a GitHub Account
- Click the drop-down menu beside “Watch” in the top right of this project’s page.
- Select “Custom” and tick the box beside “Issues”. Then click Apply.
- You should start receiving the emails beginning tomorrow.
The latest full set of scraped data is available as a SQLite DB that you can download here. Use something like SQLiteStudio to browse and query it.
Examining the data in the SQLite Database using Datasette Lite
You can use a very cool project by Simon Willison called Datasette Lite to browse and query all the latest data in your browser by going here. I highly recommend playing around with it, as you can query by keywords and date ranges.