USING THE PHYLIP PROGRAMS and Esee VIA Esee2Fel
( notes from E. Cabot)
PHYLIP is a package of phylogeny programs written by Dr. Joseph Felsenstein
of the University of Washington. Phylip is not included with ESEE, but
is available for free from Dr. Felsenstein. The utility program
Esee2Fel, described in this file, reads Esee save files and produces
data files in a format that is acceptable to the Phylip progams that deal
with nucleic acid and protein data directly (e.g. DnaPars, DnaML, etc).
Please note, Esee2Fel does NOT put any options selectors into the
data files. You must do this yourself with an editor that is capable
of using standard MS-DOS textfiles.
What does Esee2Fel do?
In roughly the following order it:
-prompts you for a file name then inputs the data,
skipping sequences that are type T or A.
-determines whether it's using type P or type N sequences
based upon the first non-T or -A sequence encountered
in the fil
- TRIMs the sequence names to 10 characters
- trims the sequence lengths to 5000 characters
- checks for sequence length conflicts
The sequences should all be of uniform length for PHYLIP.
If the (now trimmed) sequences are not of the same length,
then the program generates a report of the lengths of the
first, shortest and longest sequences. You are then prompted
for the sequence length to use. You may specify any integer
ranging from 3 to the length of the longest sequence. If
there are sequences that are already less than the length that
you specify they will be padded with either N's or ?'s (depending
on whether you are working with DNA or protein).
MAKE SURE NOT TO EXCEED THE MAXIMUM ALLOWED NUMBER OF SITES
OF THE TARGE PHYLIP PROGRAM.
- checks for name conflicts, you are prompted until all of
the names are unique
-prompts you for a name for the output file, you have an
option to escape if the file already exists
-sends data to the output file in the format required
(by protpars, dnapars, dnaml, etc)
except that there are no options specified
and no additional option lines
[NOTE: you are prompted for any options other than
User tree and Weights]
DETAILS on PHYLIP files
The general format of PHYLIP sequence data files is:
#taxa #residues/sequence [options]
[option lines if required]
Where [options] are the options detailed in a given programs documentation,
the periods stand for the amino acid or nucleic acid residues.
The name field must be 10 characters long. If the name takes only,
say, 5 characters, you must "pad" it out to 10 spaces using blanks.
ProtPars will allow you to use gaps and ambiguous residues. I don't
think that the DNA programs will.
USING ESEE to produce PHYLIP files manually.
This files in this format are VERY easy to construct with ESEE.
** HERE ARE THE STEPS USED TO CREATE THE DATA FILE
* - Start up ESEE and align your sequences.
* - Put a unique name at the beginning of every single sequence that
* is destined for the data file. Make sure to pad the name out to
* ten spaces with blanks. Don't worry about any other spaces in the
* sequences; they will be ignored by the Phylip programs.
* - Make a new sequence (usually sequence 1) to hold what will be the
* first line of the dat file. (Alternatively you could put the option
* line(s) on afterwards using TED.com or another text editor).
* For example, Sequence #1 could consist of the following characters:
* 5 250
* if there were 5 sequences of 250 residues in length.
* If there are other option lines you should start each one
* a separate screen line of ESEE, but they can be within the same
* "pseudo" sequence.
* - Now you have to output your sequences to the same file using ALT-f10
* or ALT-O.
* a.The first sequence to output is the one holding the option line(s).
* b.The other sequences are then output in any order you like.
* When you specify the filename use the same one you used for the
* option line. ESEE will report that the file exists and you
* are prompted to Append, Erase, or abort.
* Select APPEND by typing the letter A.
* c. Repeat step b. for all of the sequences.
* - Save you work with ALT-S, if you have doubts about the success
* of the above steps
* - Exit ESEE
-one small note:
ESEE's print-out window has an option called LINE LENGTH which
is normally used to control the amount of sequence displayed
per line of print-out. This parameter also affects the width
of the lines produced by the OUTPUT command that you need to use
above. I suggest using the default value of 60 for phylip files
because then each screen line will correspond to a single line in
the file. If you want to use several option lines, it could become
somewhat confusing if you've changed the LINE LENGTH.