Bio.SeqIO.TabIO module

Bio.SeqIO support for the “tab” (simple tab separated) file format.

You are expected to use this module via the Bio.SeqIO functions.

The “tab” format is an ad-hoc plain text file format where each sequence is on one (long) line. Each line contains the identifier/description, followed by a tab, followed by the sequence. For example, consider the following short FASTA format file:

>ID123456 possible binding site?
CATCNAGATGACACTACGACTACGACTCAGACTAC
>ID123457 random sequence
ACACTACGACTACGACTCAGACTACAAN

Apart from the descriptions, this can be represented in the simple two column tab separated format as follows:

ID123456(tab)CATCNAGATGACACTACGACTACGACTCAGACTAC
ID123457(tab)ACACTACGACTACGACTCAGACTACAAN

When reading this file, “ID123456” or “ID123457” will be taken as the record’s .id and .name property. There is no other information to record.

Similarly, when writing to this format, Biopython will ONLY record the record’s .id and .seq (and not the description or any other information) as in the example above.

Bio.SeqIO.TabIO.TabIterator(handle, alphabet=SingleLetterAlphabet())

Iterate over tab separated lines as SeqRecord objects.

Each line of the file should contain one tab only, dividing the line into an identifier and the full sequence.

Arguments:
  • handle - input file

  • alphabet - optional alphabet

The first field is taken as the record’s .id and .name (regardless of any spaces within the text) and the second field is the sequence.

Any blank lines are ignored.

Examples

>>> with open("GenBank/NC_005816.tsv") as handle:
...     for record in TabIterator(handle):
...         print("%s length %i" % (record.id, len(record)))
gi|45478712|ref|NP_995567.1| length 340
gi|45478713|ref|NP_995568.1| length 260
gi|45478714|ref|NP_995569.1| length 64
gi|45478715|ref|NP_995570.1| length 123
gi|45478716|ref|NP_995571.1| length 145
gi|45478717|ref|NP_995572.1| length 357
gi|45478718|ref|NP_995573.1| length 138
gi|45478719|ref|NP_995574.1| length 312
gi|45478720|ref|NP_995575.1| length 99
gi|45478721|ref|NP_995576.1| length 90
class Bio.SeqIO.TabIO.TabWriter(handle)

Bases: Bio.SeqIO.Interfaces.SequentialSequenceWriter

Class to write simple tab separated format files (OBSOLETE).

Each line consists of “id(tab)sequence” only.

Any description, name or other annotation is not recorded.

This class is now obsolete. Please use the function as_tab instead, or the top level Bio.SeqIO.write() function with format="tab".

write_record(self, record)

Write a single tab line to the file.

Bio.SeqIO.TabIO.as_tab(record)

Return record as tab separated (id(tab)seq) string.