[Biopython-dev] New Bio.SeqIO code

Chris Lasher chris.lasher at gmail.com
Wed Nov 1 22:49:04 EST 2006


I'd like to pitch in a few comments here.

Peter wrote:
> One point against names like File2SequenceIterator is the pun on two
> versus to (i.e. convert) will not be so obvious to non-native English
> speakers.

I'd like to second that. It's cute, sure, but FileToSequenceIterator
isn't that much more difficult, and leaves no room for confusion.
(e.g., Where's the File1SequenceIterator?)

Michiel wrote:
> I like the idea of one argument that takes a file name or handle. I
> believe that that is how other Biopython functions work.

Yikes! Are you serious? Why not make it easier and require a file-like
object? I would definitely not be for it taking a plain string. This
seems implicit rather than explicit. "Takes a file... or a file-like
object... or a string containing a filename... or just a string
containing the file contents... or a brief description of the data
that's in your file... or a bunch of smiley emoticons, if you're in a
good mood..." File-like objects are testable and leave little room for
surprise. Anything else seems like it's asking for a headache.

Which brings me to the issue of "guessing" a file's format. Yikes,
again! I'd expect that kind of "magickery" from Perl, but once again,
explicit is better than implicit. I honestly think it's not too much
to expect the user to know what filetype they're expecting BioPython
to deal with. Could you guys please explain the motivation behind this
to me? As I see it right now, the last thing I want is BioPython
incorrectly guessing my file format, and particularly, assuming that I
have put the proper extension to represent the file format. The
unified sequence object is what's beautiful about SeqIO, but the
guesswork that you are discussing having SeqIO's classes do is scary,
to me.

And I think by now it's predictable that I'm a fan of Peter's
suggestion to have an exception raised upon the attempt to create a
dictionary with identical IDs; all other options are, again, too
implicit for my tastes.

Thanks very much for developing SeqIO and discussing it so much, guys.
I think this will be a fantastic asset to BioPython! Keep on rockin'
it!

Chris


More information about the Biopython-dev mailing list