Bio.UniProt.GOA module¶
Parsers for the GAF, GPA and GPI formats from UniProt-GOA.
Uniprot-GOA README + GAF format description: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/README
GAF formats: http://www.geneontology.org/GO.format.annotation.shtml gp_association (GPA format) README: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_association_readme
gp_information (GPI format) README: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_information_readme
-
Bio.UniProt.GOA.
gpi_iterator
(handle)¶ Read GPI format files.
This function should be called to read a gp_information.goa_uniprot file. At the moment, there is only one format, but this may change, so this function is a placeholder a future wrapper.
-
Bio.UniProt.GOA.
gpa_iterator
(handle)¶ Read GPA format files.
This function should be called to read a gene_association.goa_uniprot file. Reads the first record and returns a gpa 1.1 or a gpa 1.0 iterator as needed
-
Bio.UniProt.GOA.
gafbyproteiniterator
(handle)¶ Iterate over records in a gene association file.
Returns a list of all consecutive records with the same DB_Object_ID This function should be called to read a gene_association.goa_uniprot file. Reads the first record and returns a gaf 2.0 or a gaf 1.0 iterator as needed 2016-04-09: added GAF 2.1 iterator & fixed bug in iterator assignment In the meantime GAF 2.1 uses the GAF 2.0 iterator
-
Bio.UniProt.GOA.
gafiterator
(handle)¶ Iterate over a GAF 1.0 or 2.0 file.
This function should be called to read a gene_association.goa_uniprot file. Reads the first record and returns a gaf 2.0 or a gaf 1.0 iterator as needed
Example: open, read, interat and filter results.
Original data file has been trimed to ~600 rows.
Original source ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/YEAST/goa_yeast.gaf.gz
>>> from Bio.UniProt.GOA import gafiterator, record_has >>> Evidence = {'Evidence': set(['ND'])} >>> Synonym = {'Synonym': set(['YA19A_YEAST', 'YAL019W-A'])} >>> Taxon_ID = {'Taxon_ID': set(['taxon:559292'])} >>> with open('UniProt/goa_yeast.gaf', 'r') as handle: ... for rec in gafiterator(handle): ... if record_has(rec, Taxon_ID) and record_has(rec, Evidence) and record_has(rec, Synonym): ... for key in ('DB_Object_Name', 'Evidence', 'Synonym', 'Taxon_ID'): ... print(rec[key]) ... Putative uncharacterized protein YAL019W-A ND ['YA19A_YEAST', 'YAL019W-A'] ['taxon:559292'] Putative uncharacterized protein YAL019W-A ND ['YA19A_YEAST', 'YAL019W-A'] ['taxon:559292'] Putative uncharacterized protein YAL019W-A ND ['YA19A_YEAST', 'YAL019W-A'] ['taxon:559292']
-
Bio.UniProt.GOA.
writerec
(outrec, handle, fields=['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence', 'With', 'Aspect', 'DB_Object_Name', 'Synonym', 'DB_Object_Type', 'Taxon_ID', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'])¶ Write a single UniProt-GOA record to an output stream.
Caller should know the format version. Default: gaf-2.0 If header has a value, then it is assumed this is the first record, a header is written.
-
Bio.UniProt.GOA.
writebyproteinrec
(outprotrec, handle, fields=['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence', 'With', 'Aspect', 'DB_Object_Name', 'Synonym', 'DB_Object_Type', 'Taxon_ID', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'])¶ Write a list of GAF records to an output stream.
Caller should know the format version. Default: gaf-2.0 If header has a value, then it is assumed this is the first record, a header is written. Typically the list is the one read by fafbyproteinrec, which contains all consecutive lines with the same DB_Object_ID
-
Bio.UniProt.GOA.
record_has
(inrec, fieldvals)¶ Accept a record, and a dictionary of field values.
The format is {‘field_name’: set([val1, val2])}. If any field in the record has a matching value, the function returns True. Otherwise, returns False.