Context Based on the scraped data, I’m planning on building an API, that’ll sync with the inventory management database (IMS). Things are still fuzzy in my head. Hopefully writing this down helps. As of now, these are the different parts of the application.
Stage 1: READ ONLY (No API) IMS ——> Scraper ——> Postgres ——> Utility Scripts
Stage 2: READ ONLY IMS ——> Scraper ——> Postgres ——> API Server ——> [Front Ends]
Stage 3: READ/WRITE IMS <——> Scraper <——> Postgres <——> API Server <——> [Front Ends]
This part is not under my control. It’s a complex system that includes accounting, inventory management, crm, etc. At the moment, I only want to retrieve the inventory to display it real-time to our customers.
The scraper is going to keep an up-to-date copy of the database in the inventory management system as a postgres instance. Scraping it once or twice a day should be enough for a read-only API. The scraper will need to function as a … not sure of the term … “reverse scraper”(?) that can manipulate the IMS’s GUI to enter data back into it.
Currently I only have a rough schema for the inventory. Guessing this will change as more data sets are scraped from the IMS. Using SQLAlchemy in the scraper to push data into and pull data out of.
I’m using the scraped data in a static site (generated by Hugo). So these scripts are going to grab the freshly updated data and insert them into the ‘data’ folder in a hugo-generated site every day/twice a day, most likely in JSON format. Poor man’s real-time inventory.
This is going to lay on top of the postgres database to serve data to different front ends. I don’t think this’ll add much value unless I have multiple front-ends consuming the data.
A static site is going to be my first front-end. Hugo accepts data in json format for generating pages, so most likely that’s the form the data will take. The static site is going to be generated anew daily/twice a day to show up-to-date inventory.