BioSQL.BioSeqDatabase module

Connect with a BioSQL database and load Biopython like objects from it.

This provides interfaces for loading biological objects from a relational database, and is compatible with the BioSQL standards.

BioSQL.BioSeqDatabase.open_database(driver='MySQLdb', **kwargs)

Load an existing BioSQL-style database.

This function is the easiest way to retrieve a connection to a database, doing something like:

from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(user="root", db="minidb")

Arguments:

driver - The name of the database driver to use for connecting. The driver should implement the python DB API. By default, the MySQLdb driver is used.
user -the username to connect to the database with.
password, passwd - the password to connect with
host - the hostname of the database
database or db - the name of the database

class BioSQL.BioSeqDatabase.DBServer(conn, module, module_name=None)

Bases: object

Represents a BioSQL database containing namespaces (sub-databases).

This acts like a Python dictionary, giving access to each namespace (defined by a row in the biodatabase table) as a BioSeqDatabase object.

__init__(conn, module, module_name=None)

Create a DBServer object.

Arguments:

conn - A database connection object
module - The module used to create the database connection
module_name - Optionally, the name of the module. Default: module.__name__

Normally you would not want to create a DBServer object yourself. Instead use the open_database function, which returns an instance of DBServer.

__repr__(): Return a short description of the class name and database connection.

__getitem__(name)

Return a BioSeqDatabase object.

Arguments:

name - The name of the BioSeqDatabase

__len__(): Return number of namespaces (sub-databases) in this database.

__contains__(value): Check if a namespace (sub-database) in this database.

__iter__(): Iterate over namespaces (sub-databases) in the database.

keys(): Iterate over namespaces (sub-databases) in the database.

values(): Iterate over BioSeqDatabase objects in the database.

items(): Iterate over (namespace, BioSeqDatabase) in the database.

__delitem__(name): Remove a namespace and all its entries.

new_database(db_name, authority=None, description=None): Add a new database to the server and return it.

load_database_sql(sql_file)

Load a database schema into the given database.

This is used to create tables, etc when a database is first created. sql_file should specify the complete path to a file containing SQL entries for building the tables.

commit(): Commit the current transaction to the database.

rollback(): Roll-back the current transaction.

close(): Close the connection. No further activity possible.

class BioSQL.BioSeqDatabase.Adaptor(conn, dbutils, wrap_cursor=False)

Bases: object

High level wrapper for a database connection and cursor.

Most database calls in BioSQL are done indirectly though this adaptor class. This provides helper methods for fetching data and executing sql.

__init__(conn, dbutils, wrap_cursor=False)

Create an Adaptor object.

Arguments:

conn - A database connection
dbutils - A BioSQL.DBUtils object
wrap_cursor - Optional, whether to wrap the cursor object

last_id(table): Return the last row id for the selected table.

autocommit(y=True): Set the autocommit mode. True values enable; False value disable.

commit(): Commit the current transaction.

rollback(): Roll-back the current transaction.

close(): Close the connection. No further activity possible.

fetch_dbid_by_dbname(dbname): Return the internal id for the sub-database using its name.

fetch_seqid_by_display_id(dbid, name)

Return the internal id for a sequence using its display id.

Arguments:

dbid - the internal id for the sub-database
name - the name of the sequence. Corresponds to the name column of the bioentry table of the SQL schema

fetch_seqid_by_accession(dbid, name)

Return the internal id for a sequence using its accession.

Arguments:

dbid - the internal id for the sub-database
name - the accession of the sequence. Corresponds to the accession column of the bioentry table of the SQL schema

fetch_seqids_by_accession(dbid, name)

Return a list internal ids using an accession.

Arguments:

dbid - the internal id for the sub-database
name - the accession of the sequence. Corresponds to the accession column of the bioentry table of the SQL schema

fetch_seqid_by_version(dbid, name)

Return the internal id for a sequence using its accession and version.

Arguments:

dbid - the internal id for the sub-database
name - the accession of the sequence containing a version number. Must correspond to <accession>.<version>

fetch_seqid_by_identifier(dbid, identifier)

Return the internal id for a sequence using its identifier.

Arguments:

dbid - the internal id for the sub-database
identifier - the identifier of the sequence. Corresponds to the identifier column of the bioentry table in the SQL schema.

list_biodatabase_names(): Return a list of all of the sub-databases.

list_bioentry_ids(dbid)

Return a list of internal ids for all of the sequences in a sub-databae.

Arguments:

dbid - The internal id for a sub-database

list_bioentry_display_ids(dbid)

Return a list of all sequence names in a sub-databae.

Arguments:

dbid - The internal id for a sub-database

list_any_ids(sql, args)

Return ids given a SQL statement to select for them.

This assumes that the given SQL does a SELECT statement that returns a list of items. This parses them out of the 2D list they come as and just returns them in a list.

execute_one(sql, args=None): Execute sql that returns 1 record, and return the record.

execute(sql, args=None): Just execute an sql command.

executemany(sql, args): Execute many sql commands.

get_subseq_as_string(seqid, start, end)

Return a substring of a sequence.

Arguments:

seqid - The internal id for the sequence
start - The start position of the sequence; 0-indexed
end - The end position of the sequence

execute_and_fetch_col0(sql, args=None): Return a list of values from the first column in the row.

execute_and_fetchall(sql, args=None): Return a list of tuples of all rows.

class BioSQL.BioSeqDatabase.MysqlConnectorAdaptor(conn, dbutils, wrap_cursor=False)

Bases: BioSQL.BioSeqDatabase.Adaptor

A BioSQL Adaptor class with fixes for the MySQL interface.

BioSQL was failing due to returns of bytearray objects from the mysql-connector-python database connector. This adaptor class scrubs returns of bytearrays and of byte strings converting them to string objects instead. This adaptor class was made in response to backwards incompatible changes added to mysql-connector-python in release 2.0.0 of the package.

execute_one(sql, args=None): Execute sql that returns 1 record, and return the record.

execute_and_fetch_col0(sql, args=None): Return a list of values from the first column in the row.

execute_and_fetchall(sql, args=None): Return a list of tuples of all rows.

__annotations__ = {}

class BioSQL.BioSeqDatabase.BioSeqDatabase(adaptor, name)

Bases: object

Represents a namespace (sub-database) within the BioSQL database.

i.e. One row in the biodatabase table, and all all rows in the bioentry table associated with it.

__init__(adaptor, name)

Create a BioDatabase object.

Arguments:

adaptor - A BioSQL.Adaptor object
name - The name of the sub-database (namespace)

__repr__(): Return a short summary of the BioSeqDatabase.

get_Seq_by_id(name)

Get a DBSeqRecord object by its name.

Example: seq_rec = db.get_Seq_by_id(‘ROA1_HUMAN’)

The name of this method is misleading since it returns a DBSeqRecord rather than a Seq object, and presumably was to mirror BioPerl.

get_Seq_by_acc(name)

Get a DBSeqRecord object by accession number.

Example: seq_rec = db.get_Seq_by_acc(‘X77802’)

The name of this method is misleading since it returns a DBSeqRecord rather than a Seq object, and presumably was to mirror BioPerl.

get_Seq_by_ver(name)

Get a DBSeqRecord object by version number.

Example: seq_rec = db.get_Seq_by_ver(‘X77802.1’)

The name of this method is misleading since it returns a DBSeqRecord rather than a Seq object, and presumably was to mirror BioPerl.

get_Seqs_by_acc(name)

Get a list of DBSeqRecord objects by accession number.

Example: seq_recs = db.get_Seq_by_acc(‘X77802’)

The name of this method is misleading since it returns a list of DBSeqRecord objects rather than a list of Seq objects, and presumably was to mirror BioPerl.

__getitem__(key)

Return a DBSeqRecord for one of the sequences in the sub-database.

Arguments:

key - The internal id for the sequence

__delitem__(key): Remove an entry and all its annotation.

__len__(): Return number of records in this namespace (sub database).

__contains__(value): Check if a primary (internal) id is this namespace (sub database).

__iter__(): Iterate over ids (which may not be meaningful outside this database).

keys(): Iterate over ids (which may not be meaningful outside this database).

values(): Iterate over DBSeqRecord objects in the namespace (sub database).

items(): Iterate over (id, DBSeqRecord) for the namespace (sub database).

lookup(**kwargs)

Return a DBSeqRecord using an acceptable identifier.

Arguments:

kwargs - A single key-value pair where the key is one of primary_id, gi, display_id, name, accession, version

load(record_iterator, fetch_NCBI_taxonomy=False)

Load a set of SeqRecords into the BioSQL database.

record_iterator is either a list of SeqRecord objects, or an Iterator object that returns SeqRecord objects (such as the output from the Bio.SeqIO.parse() function), which will be used to populate the database.

fetch_NCBI_taxonomy is boolean flag allowing or preventing connection to the taxonomic database on the NCBI server (via Bio.Entrez) to fetch a detailed taxonomy for each SeqRecord.

Example:

from Bio import SeqIO
count = db.load(SeqIO.parse(open(filename), format))

Returns the number of records loaded.