We were building an abstraction on top of a bunch of databases using SQLAlchemy. We wanted to enable the user to be able to query databases by giving just the name of table and column.
Now, this seems to be easily doable by SQLAlchemy Query function but while implementing we came across a parsing problem. When a user inputs the name of table and column, it is read as a string but what we want is a Column object of SQLAlchemy. This can be done in a few ways and we will describe the approach we used with an example.
Let the following be a database with a table hmdbkegg containing columns hmdb_id and kegg_id.
IdHmdb_idKegg_id1HMDB00122C000311HMDB00190C001861HMDB00263C00074
This can be constructed by using a class extending Base class of SQLAlchemy as follows:
Here id, kegg_id, and hmdb_id are Column objects and HMDB is a class with table name ‘hmdbkegg’.
Now, to query ‘kegg_id’ from given ‘hmdb_id’ we try using column names directly which are of type string, we wrote a simple function (which raises obvious errors when running)
def query(target_column, source_column, value):
rows = session.query(target_column).filter_by(source_column = value)
return rows
Here,
rows = session.query(HMDB.kegg_id).filter(HMDB.hmdb_id == “HMDB00122”)
is an SQLAlchemy query function which should be executed.
Now,
rows = query(“kegg_id”, “hmdb_id”, ”HMDB00001”)
Returns error, Entity ‘<class ‘tables.HMDB’>’ has no property ‘column_name
And this
rows = query(kegg_id, hmdb_id, ”HMDB00001”)
Returns error, name ‘hmdb_id’ is not defined
Using .filter method
def query(target_column, source_column, value):
rows = session.query(target_column).filter(source_column == 'value')
return rows
Now,
rows = query(“HMDB.kegg_id” , ”HMDB.hmdb_id” , ”HMDB00001”)
Returns ‘None’
A correct solution is to create a dictionary containing a mapping for Column objects and Column names.
And use this to access column object based on column name provided. For every new table/column, a new entry has to be added to the dictionary. Maybe we can do better?
If this dictionary is created by itself, that might help. These columns are defined in the class so exploring its attributes should help.
In the final solution, we defined a function which takes the class name and column name(type String) as input and returns the Column (SQLAlchemy column object). Inside this function, an SQLAlchemy Column Collection is created consisting of all Column objects for mapped class. The Column Collection is like a dictionary, with keys as column names. This function returns the Column object corresponding to column_name.
This gives classname from given table name, here ‘hmdbkegg’ as input table name and returns “HMDB” as classname
Finally,
This gives the required column object
Now, let’s use this to query some values
rows = query(‘hmdbkegg’, “kegg_id”, “hmdb_id”, ”HMDB00122”)
Terminal Output:
Works properly!
Try Polly platform for huge biological data analysis. Book a demo today!
Get the latest insights on Biomolecular data and ML