Bio.ExPASy.cellosaurus module

Parser for the cellosaurus.txt file from ExPASy.

See https://web.expasy.org/cellosaurus/

Tested with the release of Version 18 (July 2016).

Functions:
  • read Reads a file containing one cell line entry

  • parse Reads a file containing multiple cell line entries

Classes:
  • Record Holds cell line data.

Examples

This example downloads the Cellosaurus database and parses it. Note that urlopen returns a stream of bytes, while the parser expects a stream of plain string, so we use TextIOWrapper to convert bytes to string using the UTF-8 encoding. This is not needed if you download the cellosaurus.txt file in advance and open it (see the comment below).

>>> from urllib.request import urlopen
>>> from io import TextIOWrapper
>>> from Bio.ExPASy import cellosaurus
>>> url = "ftp://ftp.expasy.org/databases/cellosaurus/cellosaurus.txt"
>>> bytestream = urlopen(url)
>>> textstream = TextIOWrapper(bytestream, "UTF-8")
>>> # alternatively, use
>>> # textstream = open("cellosaurus.txt")
>>> # if you downloaded the cellosaurus.txt file in advance.
>>> records = cellosaurus.parse(textstream)
>>> for record in records:
...     if 'Homo sapiens' in record['OX'][0]:
...         print(record['ID'])  
...
#15310-LN
#W7079
(L)PC6
0.5alpha
...
Bio.ExPASy.cellosaurus.parse(handle)

Parse cell line records.

This function is for parsing cell line files containing multiple records.

Arguments:
  • handle - handle to the file.

Bio.ExPASy.cellosaurus.read(handle)

Read one cell line record.

This function is for parsing cell line files containing exactly one record.

Arguments:
  • handle - handle to the file.

class Bio.ExPASy.cellosaurus.Record

Bases: dict

Holds information from an ExPASy Cellosaurus record as a Python dictionary.

Each record contains the following keys:

Line code

Content

Occurrence in an entry

ID

Identifier (cell line name)

Once; starts an entry

AC

Accession (CVCL_xxxx)

Once

AS

Secondary accession number(s)

Optional; once

SY

Synonyms

Optional; once

DR

Cross-references

Optional; once or more

RX

References identifiers

Optional: once or more

WW

Web pages

Optional; once or more

CC

Comments

Optional; once or more

ST

STR profile data

Optional; twice or more

DI

Diseases

Optional; once or more

OX

Species of origin

Once or more

HI

Hierarchy

Optional; once or more

OI

Originate from same individual

Optional; once or more

SX

Sex of cell

Optional; once

AG

Age of donor at sampling

Optional; once

CA

Category

Once

DT

Date (entry history)

Once

//

Terminator

Once; ends an entry

__init__()

Initialize the class.

__repr__()

Return the canonical string representation of the Record object.

__str__()

Return a readable string representation of the Record object.