Bibliography

[Altschul1990]

Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman: Basic Local Alignment Search Tool. Journal of Molecular Biology 215 (3): 403–410 (1990). https://doi.org/10.1016/S0022-2836(05)80360-2

[Bailey1994]

Timothy L. Bailey and Charles Elkan: Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology 28–36. AAAI Press, Menlo Park, California (1994).

[Cavener1987]

Douglas R. Cavener: Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Research 15 (4): 1353–1361 (1987). https://doi.org/10.1093/nar/15.4.1353

[Chapman2000]

Brad Chapman and Jeff Chang: Biopython: Python tools for computational biology. ACM SIGBIO Newsletter 20 (2): 15–19 (August 2000).

[Cock2009]

Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczyński, Michiel J. L. de Hoon: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163

[Cock2010]

Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, Peter M. Rice: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 38 (6): 1767–1771 (2010). https://doi.org/10.1093/nar/gkp1137

[Cornish1985]

Athel Cornish-Bowden: Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Nucleic Acids Research 13 (9): 3021–3030 (1985). https://doi.org/10.1093/nar/13.9.3021

[Darling2004]

Aaron E. Darling, Bob Mau, Frederick R. Blattner, Nicole T. Perna: Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Research 14 (7): 1394–1403 (2004). https://doi.org/10.1101/gr.2289704

[Dayhoff1978]

M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt: A Model of Evolutionary Change in Proteins. Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, 1978: 345–352. The National Biomedical Research Foundation, 1979.

[DeHoon2004]

Michiel J. L. de Hoon, Seiya Imoto, John Nolan, Satoru Miyano: Open source clustering software. Bioinformatics 20 (9): 1453–1454 (2004). https://doi.org/10.1093/bioinformatics/bth078

[Durbin1998]

Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK (1998).

[Eisen1998]

Michiel B. Eisen, Paul T. Spellman, Patrick O. Brown, David Botstein: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences USA 95 (25): 14863–14868 (1998). https://doi.org/10.1073/pnas.95.25.14863

[Goldman1994]

Nick Goldman and Ziheng Yang: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution 11 (5) 725–736 (1994). https://doi.org/10.1093/oxfordjournals.molbev.a040153

[Golub1971]

Gene H. Golub, Christian Reinsch: Singular value decomposition and least squares solutions. In Handbook for Automatic Computation, 2, (Linear Algebra) (J. H. Wilkinson and C. Reinsch, eds), 134–151. New York: Springer-Verlag (1971).

[Golub1989]

Gene H. Golub, Charles F. Van Loan: Matrix computations, 2nd edition (1989).

[Hamelryck2003A]

Thomas Hamelryck and Bernard Manderick: PDB parser and structure class implemented in Python. Bioinformatics 19 (17): 2308–2310 (2003) https://doi.org/10.1093/bioinformatics/btg299

[Hamelryck2003B]

Thomas Hamelryck: Efficient identification of side-chain patterns using a multidimensional index tree. Proteins 51 (1): 96–108 (2003). https://doi.org/10.1002/prot.10338

[Hamelryck2005]

Thomas Hamelryck: An amino acid has two sides; A new 2D measure provides a different view of solvent exposure. Proteins 59 (1): 29–48 (2005). https://doi.org/10.1002/prot.20379

[Henikoff1992]

Steven Henikoff, Jorja G. Henikoff: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA 89 (2): 10915–10919 (1992). https://doi.org/10.1073/pnas.89.22.10915

[Hihara2001]

Yukako Hihara, Ayako Kamei, Minoru Kanehisa, Aaron Kaplan and Masahiko Ikeuchi: DNA microarray analysis of cyanobacterial gene expression during acclimation to high light. Plant Cell 13 (4): 793–806 (2001). https://doi.org/10.1105/tpc.13.4.793

[Hughey1996]

Richard Hughey, Anders Krogh: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Computer Applications in the Biosciences: CABIOS 12 (2): 95–107 (1996). https://doi.org/10.1093/bioinformatics/12.2.95

[Jupe2012]

Florian Jupe, Leighton Pritchard, Graham J. Etherington, Katrin MacKenzie, Peter JA Cock, Frank Wright, Sanjeev Kumar Sharma, Dan Bolser, Glenn J Bryan, Jonathan DG Jones, Ingo Hein: Identification and localisation of the NB-LRR gene family within the potato genome. BMC Genomics 13: 75 (2012). https://doi.org/10.1186/1471-2164-13-75

[Kachitvichyanukul1988]

Voratas Kachitvichyanukul, Bruce W. Schmeiser: Binomial Random Variate Generation. Communications of the ACM 31 (2): 216–222 (1988). https://doi.org/10.1145/42372.42381

[Kent2002]

W. James Kent: BLAT – The BLAST-Like Alignment Tool. Genome Research 12: 656–664 (2002). https://doi.org/10.1101/gr.229202

[Kohonen1997]

Teuvo Kohonen: Self-organizing maps, 2nd Edition. Berlin; New York: Springer-Verlag (1997).

[Krogh1994]

Anders Krogh, Michael Brown, I. Saira Mian, Kimmen Sjölander, David Haussler: Hidden Markov Models in computational biology: Applications to protein modeling. Journal of Molecular Biology 235 (5): 1501–1531 (1994). https://doi.org/10.1006/jmbi.1994.1104

[Lecuyer1988]

Pierre L’Ecuyer: Efficient and Portable Combined Random Number Generators. Communications of the ACM 31 (6): 742–749,774 (1988). https://doi.org/10.1145/62959.62969

[Li1985]

Wen-Hsiung Li, Chung-I Wu, Chi-Cheng Luo: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Molecular Biology and Evolution 2 (2): 150–174 (1985). https://doi.org/10.1093/oxfordjournals.molbev.a040343

[Li2009]

Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 (16): 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352

[Maddison1997]

David R. Maddison, David L. Swofford, Wayne P. Maddison: Nexus: An Extensible File Format for Systematic Information. Systematic Biology 46 (4): 590–621 (1997). https://doi.org/10.1093/sysbio/46.4.590

[Majumdar2005]

Indraneel Majumdar, S. Sri Krishna, Nick V. Grishin: PALSSE: A program to delineate linear secondary structural elements from protein structures. BMC Bioinformatics 6: 202 (2005). https://doi.org/10.1186/1471-2105-6-202.

[Matys2003]

V. Matys, E. Fricke, R. Geffers, E. Gößling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Münch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, E. Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research 31 (1): 374–378 (2003). https://doi.org/10.1093/nar/gkg108

[Nei1986]

Masatoshi Nei and Takashi Gojobori: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution 3 (5): 418–426 (1986). https://doi.org/10.1093/oxfordjournals.molbev.a040410

[Pearson1988]

William R. Pearson, David J. Lipman: Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA 85 (8): 2444–2448 (1988). https://doi.org/10.1073/pnas.85.8.2444

[Pritchard2006]

Leighton Pritchard, Jennifer A. White, Paul R.J. Birch, Ian K. Toth: GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 22 (5): 616–617 (2006). https://doi.org/10.1093/bioinformatics/btk021

[Proux2002]

Caroline Proux, Douwe van Sinderen, Juan Suarez, Pilar Garcia, Victor Ladero, Gerald F. Fitzgerald, Frank Desiere, Harald Brüssow: The dilemma of phage taxonomy illustrated by comparative genomics of Sfi21-Like Siphoviridae in lactic acid bacteria. Journal of Bacteriology 184 (21): 6026–6036 (2002). https://doi.org/10.1128/JB.184.21.6026-6036.2002

[Rice2000]

Peter Rice, Ian Longden, Alan Bleasby: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16 (6): 276–277 (2000). https://doi.org/10.1016/S0168-9525(00)02024-2

[Saldanha2004]

Alok Saldanha: Java Treeview—extensible visualization of microarray data. Bioinformatics 20 (17): 3246–3248 (2004). https://doi.org/10.1093/bioinformatics/bth349

[Schneider1986]

Thomas D. Schneider, Gary D. Stormo, Larry Gold: Information content of binding sites on nucleotide sequences. Journal of Molecular Biology 188 (3): 415–431 (1986). https://doi.org/10.1016/0022-2836(86)90165-8

[Schneider2005]

Adrian Schneider, Gina M. Cannarozzi, and Gaston H. Gonnet: Empirical codon substitution matrix. BMC Bioinformatics 6: 134 (2005). https://doi.org/10.1186/1471-2105-6-134

[Sibson1973]

Robin Sibson: SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16 (1): 30–34 (1973). https://doi.org/10.1093/comjnl/16.1.30

[Slater2005]

Guy St C. Slater, Ewan Birney: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31 (2005). https://doi.org/10.1186/1471-2105-6-31

[Snedecor1989]

George W. Snedecor, William G. Cochran: Statistical methods. Ames, Iowa: Iowa State University Press (1989).

[Steinegger2019]

Martin Steinegger, Markus Meier, Milot Mirdita, Harald Vöhringer, Stephan J. Haunsberger, Johannes Söding: HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 20: 473 (2019). https://doi.org/10.1186/s12859-019-3019-7

[Talevich2012]

Eric Talevich, Brandon M. Invergo, Peter J.A. Cock, Brad A. Chapman: Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 13: 209 (2012). https://doi.org/10.1186/1471-2105-13-209

[Tamayo1999]

Pablo Tamayo, Donna Slonim, Jill Mesirov, Qing Zhu, Sutisak Kitareewan, Ethan Dmitrovsky, Eric S. Lander, Todd R. Golub: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences USA 96 (6): 2907–2912 (1999). https://doi.org/10.1073/pnas.96.6.2907

[Toth2006]

Ian K. Toth, Leighton Pritchard, Paul R. J. Birch: Comparative genomics reveals what makes an enterobacterial plant pathogen. Annual Review of Phytopathology 44: 305–336 (2006). https://doi.org/10.1146/annurev.phyto.44.070505.143444

[Vanderauwera2009]

Géraldine A. van der Auwera, Jaroslaw E. Król, Haruo Suzuki, Brian Foster, Rob van Houdt, Celeste J. Brown, Max Mergeay, Eva M. Top: Plasmids captured in C. metallidurans CH34: defining the PromA family of broad-host-range plasmids. Antonie van Leeuwenhoek 96 (2): 193–204 (2009). https://doi.org/10.1007/s10482-009-9316-9

[Waterman1987]

Michael S. Waterman, Mark Eggert: A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. Journal of Molecular Biology 197 (4): 723–728 (1987). https://doi.org/10.1016/0022-2836(87)90478-5

[Yang2000]

Ziheng Yang and Rasmus Nielsen: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution 17 (1): 32–43 (2000). https://doi.org/10.1093/oxfordjournals.molbev.a026236

[Yeung2001]

Ka Yee Yeung, Walter L. Ruzzo: Principal Component Analysis for clustering gene expression data. Bioinformatics 17 (9): 763–774 (2001). https://doi.org/10.1093/bioinformatics/17.9.763