National Library of Medicine

Natural Language Systems Program

The Natural Language Systems (NLS) Program, conducted by the Computer Science
Branch of the Lister Hill National Center for Biomedical
Communications, involves research in natural language processing to
improve access to biomedical information stored in computerized form.
Research questions in natural language processing lie at the
intersection of the fields of computer science, linguistics, and
information science, and involve methods and approaches used by each
of these fields. The NLS research program is based on the testable
hypothesis that systems combining domain knowledge with sophisticated
linguistic analysis will lead to improved representation and
retrieval of biomedical information. Such systems must include both
linguistic knowledge (lexical information and rules of morphology,
syntax, and semantics) and domain knowledge (biomedical facts,
relations among the facts, and rules to process these facts and

The primary focus of the NLS research program is the development of
SPECIALIST, an experimental system for parsing, analyzing, and
accessing biomedical text. The system is implemented in Quintus
Prolog and C and runs on Sun workstations.

Major NLS activities include the development of the parser, the
development of an extensive lexicon, and the development of automated
tools and applications for enhancing the research process. The
SPECIALIST parsing system is designed to capture the regularities of
general English as well as the specialized nature of biomedical text.
The lexicon, which forms a central part of the system, includes
general English lexical items as well as items specific to the domain
of biomedicine. Each lexical entry encodes morphologic, syntactic
and semantic information. This information is used by the parser as
it builds structured representations of sentences in biomedical

NLS investigators have also developed a set of applications for
interactive and flexible access to a variety of sources of biomedical
information. Included among these are a tool for browsing,
navigating, and extracting information from NLM's Medical Subject
Headings; a set of programs for online access to a large medical
dictionary; and an application providing access to a test collection
of user requests and MEDLINE" citation records. The test collection
consists of requests made by users of the MEDLINE system, citation
records retrieved for these requests, and assessments of the
relevancy of these citation records to the requests made.

The test collection serves a number of purposes in NLS project work.
The lexicon building tool, for example, allows the user to look up a
lexical item in the collection and find all the sentences in which
that item appears. Seeing a lexical item in its linguistic context
provides important information about the use of that lexical item in
the domain of discourse. In addition, the database application
designed by program staff provides online access to the test
collection, allowing for experimentation in information retrieval
methods and techniques.

A necessary component of a natural language processing system is
knowledge of the relevant domain. This knowledge must be in such a
form that a computerized system can interpret it and reason with it.
Recent Unified Medical Language System' (UMLS') initiatives, including
the development of the Metathesaurus' and the development of the
Semantic Network, have direct implications for this aspect of the
SPECIALIST system. The Metathesaurus is the central vocabulary
component of the UMLS. It contains information about specific
biomedical concepts, including their representation and contexts in a
variety of controlled vocabularies. The Semantic Network defines the
likely relationships among the types or categories of terms in the

NLS efforts in the near future will include the development, testing,
and evaluation of the UMLS Semantic Network. This will involve the
development of application systems for linking this knowledge source
to the information stored in the Metathesaurus. Programs which allow
for intelligent traversal of the Network will be designed and
implemented. Subsequently, the Network, together with the large
number of concepts to which semantic types have been assigned, will
be tested as a source of domain knowledge for the SPECIALIST system.

For more information on the Natural Language Systems Research Program

Alexa T. McCray, Ph.D.
Computer Science Branch
Lister Hill National Center
for Biomedical Communications
National Library of Medicine
Bethesda, Maryland 20894

