Category : Various Text files
Archive   : NLM-INFO.ZIP
Filename : MACHLRN.TXT

Output of file : MACHLRN.TXT contained in archive : NLM-INFO.ZIP
National Library of Medicine
Machine Learning Project

There has been an explosion of biomedical information, and a less
well-acknowledged explosion of the analytical tools and techniques
that can be applied to that information. How can researchers and
health care professionals make sure they are getting the best
information available and finding the right tools for analyzing it?

The Machine Learning Project at NLM's Lister Hill National Center for
Biomedical Communications (LHNCBC) was founded in 1989 to develop
methods by which computer programs can help researchers and health-
care professionals exploit rapidly expanding on-line sources of
knowledge. To take full advantage of the anticipated exponential
growth of biomedical data and of the increasingly evident
interrelationships among previously disparate information sources,
dramatic improvements in automated knowledge manipulation, analysis
and inference will be needed.

Machine Learning: the Vision

Expert systems have already advanced our technological capabilities
from the processing of information toward the manipulation of
knowledge. The goal of the Machine Learning Project is to create
computer programs that not only manipulate knowledge, but also can
acquire it themselves. Ideally, a researcher or health care
professional with a question should be able to have a machine
learning program identify where to find relevant information,
retrieve that information (possibly from multiple data sources), and
analyze and assemble the information into a complete, accurate and
comprehensible representation of the desired knowledge.

Machine learning research may also help transcend the traditional
human-computer interaction: a user issuing commands and a program
responding. In a world of rapidly advancing knowledge, programs will
have to do more than retrieve information when asked; they will have
to manage retrieval and inference over time. Once a user has
specified a question of interest, a machine learning program should
be able continuously and intelligently to track evolving knowledge
sources. When the program finds relevant new information, it should
automatically assemble, analyze and send that information to the
original questioner.

This vision is the driving force behind the LHNCBC's Machine Learning
Project. Currently, machine learning technologies focus primarily on
inducing concept definitions from externally specified datasets. In
pursuing the vision, the project endeavors to advance significantly
the state of the art in machine learning by creating a
computationally tractable theory of how to use multiple sources of
knowledge and to deploy complex analytical tools in pursuit of
explicitly stated goals. This approach, called knowledge acquisition
planning, is still in an early stage of development. Although
achieving the vision of an automated knowledge management and
acquisition system is a long term goal which will require fundamental
advances in basic computer science, the process of developing the
theory and implementing prototypes has already produced some useful


The primary testbed for research in knowledge acquisition planning at
LHNCBC is a program that selects and manages the use of computerized
analytical tools and database searchers to achieve specific goals. This
experimental program, INVESTIGATOR, runs both on a special purpose
computer called a Lisp Machine and on a powerful general purpose
parallel computer. The program operates by selecting other programs,
such as statistical analysis packages and database search engines, which
can be applied to achieve its human-provided knowledge acquisition

Each program INVESTIGATOR deploys must have an internal representation
describing the preconditions for its execution: the data formats the
program requires, its expected outputs, how long it takes to run, and
so forth. Using this information, INVESTIGATOR's planning mechanism can
select appropriate tools and databases to accomplish a variety of tasks.
INVESTIGATOR can invoke other programs on remote computers and can plan
to make efficient use of a diverse, heterogeneous computing environment.
Further research into the use of both coarse- and fine-grained parallel
computation is a significant component of the Machine Learning Project.
Research is currently under way in the use of parallel distributed
processing networks and of massively parallel computing environments;
and in the treatment of computer time, network access and other hardware
facilities as resources to be allocated as part of a planning task.

INVESTIGATOR has been programmed to use several important computerized
analytical tools and to plan to acquire knowledge in several different
domains. The analytical tools include inductive category formation,
heuristic Bayesian classification, marker passing intersection search,
analysis of variance, random sampling and back propagation trained
artificial neural networks. The databases INVESTIGATOR has accessed
include MEDLINE", GenInfo (the sequence database of the National Center
for Biotechnology Information), the Protein Information Resource,
Brookhaven National Laboratory's Protein Databank of crystallographic
structure information, and others. Results of INVESTIGATOR-managed
knowledge acquisition plans have addressed questions in diverse domains
including early Eukaryotic evolution, classification of the structural
elements of proteins, and changes in protein conformation after point

Additional Research Issues

This research in knowledge acquisition planning has the potential for
high payoff in improving the capabilities of machine learning systems
generally. Current machine learning technology is fundamentally limited
by the computational complexity of exploring the space of hypotheses
compatible with a set of data. The knowledge acquisition planning
approach provides important constraints on the space of hypotheses
searched by using the specific goals of the learning system to focus
computation on the available methods and data most likely to lead to
answers to its questions. This method of using the content of desired
knowledge to constrain the search space may result in significant
improvement in the performance of machine learning systems.

The general problem of selecting and coordinating diverse and complex
sources of knowledge touches on many open questions in cognitive
science. The only available models for designing a system that might
accomplish these tasks are human beings. A significant component of the
Project's research is therefore the analysis of human subjects as they
acquire knowledge. Machine Learning Project personnel work with
computer-sophisticated biomedical researchers to gather data on how
people manage knowledge acquisition tasks. Protocols of researchers
using computer tools and devising retrieval and analysis strategies have
been gathered and analyzed to provide insight into this complex
cognitive process. Results from these experiments have led to the
identification of connections between attentional phenomena in cognitive
and social psychology and computational complexity considerations in the
design of machine learning systems. Potential implications of this
research for the understanding of human cognitive phenomena are also
being pursued. The machine learning research program places a strong
emphasis on the use of cognitive models in the design of artificial
intelligence systems. Success in this research would create a meta-tool
for health care professionals trying to chart a course through the
increasingly complex world of automated biomedical knowledge.

For further information, contact:

Machine Learning Project
Computer Science Branch
National Library of Medicine/LHNCBC
8600 Rockville Pike
Bethesda, Maryland 20894

  3 Responses to “Category : Various Text files
Archive   : NLM-INFO.ZIP
Filename : MACHLRN.TXT

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: