National Library of Medicine
APRIL 1992
National Center for Biotechnology Information

Pre-Release of NCBI CD-ROM - Entrez: Sequences

Evaluation CD-ROM

The National Center for Biotechnology Information (NCBI) is providing
a limited distribution of an end-user CD-ROM which contains an
integrated view of DNA and protein sequence data and their associated
MEDLINE_ entries. After an evaluation of the CD-ROM, retrieval
software and databases, a general distribution is planned for late

Entrez Retrieval System

The retrieval system, Entrez, was developed at the NCBI and provides
a fast and intuitive user interface for searching sequence and
bibliographic data. A key feature of the system is the concept of
'neighboring' whereby a user can locate all related references or
sequences by using pre-computed tables which measure the relatedness
of all articles and all sequences.

Thus, a user can ask "Find all papers that are like this one" or
"Find all sequences that are like this sequence." In addition to
neighboring, which relates records within a database by statistical
measures of similarity, 'hard links' have been created between
entries in different databases. For example, given a MEDLINE article,
there are links to entries in sequence databases which have
referenced that article. Links also exist between nucleotide
sequences and the proteins derived from them through conceptual


The Entrez: Sequences CD-ROM contains over 86,000 MEDLINE citations
with abstracts which have been indexed under the MeSH_ heading
"Molecular Sequence Data" or which have been cited in the sequence
databases. The CD-ROM contains over 53,000 protein sequences from the
Protein Identification Resource (PIR) and conceptual translation of
GenBank_ and over 35,000 nucleotide sequences from GenBank. GenInfo
Backbone sequence data will be added in later releases. Sequences
have been grouped by similarity by comparing the sequence databases
against themselves using the BLAST local alignment algorithm.


The Entrez: Sequences CD-ROM operates on Macintosh_ and PC-
compatible systems with Microsoft Windows_. A CD-ROM of unprocessed
data files without retrieval software will be available at a later
date for sites which integrate sequence databases into custom systems
and for commercial redistributors. It is expected that both CD-ROMs
will be distributed at two-month intervals.

A mailing list is now being assembled for individuals who would be
interested in participating in the CD-ROM evaluation or who would
like to stay informed of the availability of subscriptions to NCBI


National Library of Medicine
Bldg. 38A, Room 8N-803
8600 Rockville Pike
Bethesda, MD 20894

