June 06, 2017
Dug deeper into pushing a pandas dataframe into a postgres database via sqlalchemy.
The main situation I’m running into is that a postgres table created from a pandas dataframe using the to_sql method does not specifiy a Primary Key.
There is no way to specify a primary key using that method, as of now.
SQLAlchemy uses an ‘automap’ Base object (rather than the more common declarative_base) with ‘reflect’ to take an existing table and turn it into an sqlalchemy object.
However, it can’t do that if the table doesn’t have a Primary Key.
So, after new rows are inserted into the postgres db, I’ll need to specify the primary key and then try reflecting the db.
Will need to execute raw SQL. Using my own methods:
from sqlalchemy.ext.automap import automap_base # Create a db and scraper object. scraper = sps.Scraper() db = sps.Database() # Grab the report in a pandas dataframe. report = scraper.inventory_report() # Create the Base object. Base = automap_base() # Execute sql statements to add primary key. with db.session_scope() as s: s.execute("ALTER TABLE <table_name> rename column index to id") #the 'index' column is created by pandas s.execute("ALTER TABLE <table_name> ADD PRIMARY KEY (id);") #set the 'id' column as the primary key # Now I can reflect the tables successfully. Base.prepare(db.engine, reflect=True) #db.engine is an sqlalchemy engine object # Access tables from the Base object. TableName = Base.classes.<table_name> # Now I can use the usual methods on it. with db.session_scope() as sesh: result = sesh.query(TableName).first() print result.id
* Stuff all of this neatly into methods/classes in my existing codebase.
* Figure out whether the primary key is relevant or not, i.e. should I use it to only add unique rows or just replace the entire table with each new scrape.