Bio.PDB.PDBList module
Access the PDB over the internet (e.g. to download structures).
- class Bio.PDB.PDBList.PDBList(server='https://files.wwpdb.org', pdb=None, obsolete_pdb=None, verbose=True)
Bases:
object
Quick access to the structure lists on the PDB or its mirrors.
This class provides quick access to the structure lists on the PDB server or its mirrors. The structure lists contain four-letter PDB codes, indicating that structures are new, have been modified or are obsolete. The lists are released on a weekly basis.
It also provides a function to retrieve PDB files from the server. To use it properly, prepare a directory /pdb or the like, where PDB files are stored.
All available file formats (PDB, PDBx/mmCif, PDBML, mmtf) are supported. Please note that large structures (containing >62 chains and/or 99999 ATOM lines) are no longer stored as a single PDB file and by default (when PDB format selected) are not downloaded.
Large structures can be downloaded in other formats, including PDBx/mmCif or as a .tar file (a collection of PDB-like formatted files for a given structure).
If you want to use this module from inside a proxy, add the proxy variable to your environment, e.g. in Unix: export HTTP_PROXY=’http://realproxy.charite.de:888’ (This can also be added to ~/.bashrc)
- PDB_REF = '\n The Protein Data Bank: a computer-based archival file for macromolecular structures.\n F.C.Bernstein, T.F.Koetzle, G.J.B.Williams, E.F.Meyer Jr, M.D.Brice, J.R.Rodgers, O.Kennard, T.Shimanouchi, M.Tasumi\n J. Mol. Biol. 112 pp. 535-542 (1977)\n http://www.pdb.org/.\n '
- __init__(server='https://files.wwpdb.org', pdb=None, obsolete_pdb=None, verbose=True)
Initialize the class with the default server or a custom one.
Argument pdb is the local path to use, defaulting to the current directory at the moment of initialisation.
- static get_status_list(url)
Retrieve a list of pdb codes in the weekly pdb status file from given URL.
Used by get_recent_changes. Typical contents of the list files parsed by this method is now very simply - one PDB name per line.
- get_recent_changes()
Return three lists of the newest weekly files (added,mod,obsolete).
Reads the directories with changed entries from the PDB server and returns a tuple of three URL’s to the files of new, modified and obsolete entries from the most recent list. The directory with the largest numerical name is used. Returns None if something goes wrong.
Contents of the data/status dir (20031013 would be used);:
drwxrwxr-x 2 1002 sysadmin 512 Oct 6 18:28 20031006 drwxrwxr-x 2 1002 sysadmin 512 Oct 14 02:14 20031013 -rw-r–r– 1 1002 sysadmin 1327 Mar 12 2001 README
- get_all_entries()
Retrieve the big file containing all the PDB entries and some annotation.
Returns a list of PDB codes in the index file.
- get_all_obsolete()
Return a list of all obsolete entries ever in the PDB.
Returns a list of all obsolete pdb codes that have ever been in the PDB.
Gets and parses the file from the PDB server in the format (the first pdb_code column is the one used). The file looks like this:
LIST OF OBSOLETE COORDINATE ENTRIES AND SUCCESSORS OBSLTE 31-JUL-94 116L 216L ... OBSLTE 29-JAN-96 1HFT 2HFT OBSLTE 21-SEP-06 1HFV 2J5X OBSLTE 21-NOV-03 1HG6 OBSLTE 18-JUL-84 1HHB 2HHB 3HHB OBSLTE 08-NOV-96 1HID 2HID OBSLTE 01-APR-97 1HIU 2HIU OBSLTE 14-JAN-04 1HKE 1UUZ ...
- retrieve_pdb_file(pdb_code, obsolete=False, pdir=None, file_format=None, overwrite=False)
Fetch PDB structure file from PDB server, and store it locally.
The PDB structure’s file name is returned as a single string. If obsolete
==
True, the file will be saved in a special file tree.NOTE. The default download format has changed from PDB to PDBx/mmCif
- Parameters:
pdb_code (string) – 4-symbols structure Id from PDB (e.g. 3J92).
file_format (string) –
File format. Available options:
”mmCif” (default, PDBx/mmCif file),
”pdb” (format PDB),
”xml” (PDBML/XML format),
”mmtf” (highly compressed),
”bundle” (PDB formatted archive for large structure)
overwrite (bool) – if set to True, existing structure files will be overwritten. Default: False
obsolete (bool) – Has a meaning only for obsolete structures. If True, download the obsolete structure to ‘obsolete’ folder, otherwise download won’t be performed. This option doesn’t work for mmtf format as obsoleted structures aren’t stored in mmtf. Also doesn’t have meaning when parameter pdir is specified. Note: make sure that you are about to download the really obsolete structure. Trying to download non-obsolete structure into obsolete folder will not work and you face the “structure doesn’t exists” error. Default: False
pdir (string) – put the file in this directory (default: create a PDB-style directory tree)
- Returns:
filename
- Return type:
string
- update_pdb(file_format=None, with_assemblies=False)
Update your local copy of the PDB files.
I guess this is the ‘most wanted’ function from this module. It gets the weekly lists of new and modified pdb entries and automatically downloads the according PDB files. You can call this module as a weekly cron job.
- download_pdb_files(pdb_codes: list[str], obsolete: bool = False, pdir: str | None = None, file_format: str | None = None, overwrite: bool = False, max_num_threads: int | None = None)
Fetch set of PDB structure files from the PDB server and store them locally.
- Parameters:
pdb_codes – A list of 4-symbol PDB structure IDs
obsolete – Has a meaning only for obsolete structures. If True, download the obsolete structure to ‘obsolete’ folder. Otherwise, the download won’t be performed. This option doesn’t work for mmtf format as obsolete structures are not available as mmtf. (default:
False
)pdir – Put the file in this directory. By default, create a PDB-style directory tree.
file_format –
File format. Available options:
”mmCif” (default, PDBx/mmCif file),
”pdb” (format PDB),
”xml” (PMDML/XML format),
”mmtf” (highly compressed),
”bundle” (PDB formatted archive for large structure).
overwrite – If set to true, existing structure files will be overwritten. (default:
False
)max_num_threads – The maximum number of threads to use when downloading files
- get_all_assemblies(file_format: str = '') list[tuple[str, str]]
Retrieve the list of PDB entries with an associated bio assembly.
The requested list will be cached to avoid multiple calls to the server.
- Parameters:
file_format (str) – A legacy parameter that is left to avoid breaking changes
- Returns:
the assemblies
- Return type:
list
- retrieve_assembly_file(pdb_code, assembly_num, pdir=None, file_format=None, overwrite=False)
Fetch one or more assembly structures associated with a PDB entry.
Unless noted below, parameters are described in
retrieve_pdb_file
.- Parameters:
assembly_num (str) – assembly number to download.
:rtype : str :return: file name of the downloaded assembly file.
- download_all_assemblies(listfile: str | None = None, file_format: str | None = None, max_num_threads: int | None = None)
Retrieve all biological assemblies not in the local PDB copy.
- Parameters:
listfile – File name to which all assembly codes will be written
file_format – Format in which to download the entries. Available options are “mmCif” or “pdb”. Defaults to “mmCif”.
max_num_threads – The maximum number of threads to use while downloading the assemblies
- download_entire_pdb(listfile: str | None = None, file_format: str | None = None, max_num_threads: int | None = None)
Retrieve all PDB entries not present in the local PDB copy.
NOTE: The default download format has changed from PDB to PDBx/mmCif.
- Parameters:
listfile – Filename to which all PDB codes will be written
file_format –
File format. Available options:
”mmCif” (default, PDBx/mmCif file),
”pdb” (format PDB),
”xml” (PMDML/XML format),
”mmtf” (highly compressed),
”bundle” (PDB formatted archive for large structure)
max_num_threads – The maximum number of threads to use while downloading PDB entries
- download_obsolete_entries(listfile: str | None = None, file_format: str | None = None, max_num_threads: int | None = None)
Retrieve all obsolete PDB entries not present in local obsolete PDB copy.
NOTE: The default download format has changed from PDB to PDBx/mmCif.
- Parameters:
listfile – Filename to which all PDB codes will be written
file_format –
File format. Available options:
”mmCif” (default, PDBx/mmCif file),
”pdb” (PDB format),
”xml” (PMDML/XML format).
max_num_threads – The maximum number of threads to use while downloading PDB entries
- get_seqres_file(savefile='pdb_seqres.txt')
Retrieve and save a (big) file containing all the sequences of PDB entries.