This demo shows a number of technologies developed at CNRI working together to show the ease with which Knowbot Programs can be used to mediate and perform custom database searches. To skip right to running the demo, instructions can be found in MEDLINE Demo Startup Procedures.
In order to retrieve MEDLINE records or URLs to MEDLINE
records from the handle server in this demo,
you must be able to contact gather.cnri.reston.va.us
with a UDP datagram to reach the Handle Management System
and port 8000/tcp for a custom HTTP repository. If this
is not possible, full records
can still be retrieved from medlars.nlm.nih.gov
,
presuming you can reach that host via 23/tcp (telnet).
If you cannot reach the MEDLARS machine from your location,
then you cannot run this demo.
In addition to having the KOE software installed as described in the Installation Guide, two additional pieces of software are needed as well. These are the Grail browser from CNRI and the Python Telnet-Expect Module from the Digital Library Project at Stanford. Installation of these components is described in the Installation Guide as well.
The CNRI Handle Management system is a global naming system. It is a distributed computer system which stores these names and provides information needed to locate and access these named items. Handles are a form of Universal Resource Name (URN). The system can be used to store pointers to the location of items as well. These pointers can be URLs. Handle records can also contain end-point data. In this case, the system would not be providing a level of location indirection.
The Local Handle Server system allows globally registered authorities to establish handle records which are maintained on a machine within a local administrative domain instead of on the global handle server systems. Local Handle Servers also provide an ideal platform for organizations to experiment with the CNRI handle protocol.
To enable the loading of MEDLINE records directly into the handle system, a new feature of the CNRI handle system was employed: user defined record types. A user defined record type allows the handle system to recognize and specify data types not configured by default in the handle system software. In the case of NLM, a user defined data type of "NLM" which contains a single MEDLINE record. This is similar to specifying schema information in a database management system with one important exception: the handle system does not specify or care about the contents of these new record types. It simply identifes them uniquely.
This demonstration uses a repository of 500,000 MEDLINE records provided by the National Library of Medicine. These records were loaded into two services: a prototype of a Kahn & Wilenski repository called the Primitive Repository, and the CNRI Handle System.
A primitive repository was constructed as a prototype for the
future Kahn & Wilenski style repository, an example of which is
being developed at CNRI under the Computer Science Technical
Reports project. For data storage a simple hash mechanism using
GNU gdbm
files was employed. Access to records was
implemented using the HTTP protocol. An HTTP server was
constructed that takes URLs with query strings of the form:
http://gather.cnri.reston.va.us:8000/?database=medline&command=lookup&key=c10101and returns the MEDLINE record, formatted as HTML, that corresponds to a record with an Elhill UI field of c10101.
The CNRI Handle System provides identifiers for digital objects and other resources in distributed computer systems such as MEDLINE records. For the purpose of experimenting with the NLM MEDLINE data, we created handles for all 500,000 MEDLINE records. These handles were of the form:
nlm.hdl_test/10101Access to these handles through the Grail Internet Browser was implemented via the hdl protocol scheme. resolving the above handle in Grail would be done with the following URL:
hdl:nlm.hdl_test/10101In this example, hdl is the scheme telling a browser how to treat what follows, nlm is the Handle Authority, and hdl_test is the sub-authority. What follows the slash is the opaque string which must be unique within this authority/sub-authority combination. In the case of the handles corresponding to MEDLINE records the opaque string is the Unique Identifier corresponding to the MEDLINE record from the record data itself.
The MEDLINE record handles were added to the handle system with two data types: URL and NLM. The URL data type data was the URL that could be processed against the primitive repository to actually get the record from there. Resolving a handle to it's URL data type actually uses the handle system as a pure name resolution system. The location independent name of the record is resolved into a location dependent pointer to the record. Conversely, the NLM data type actually contains the record directly in the handle system to facilitate retrievals with the highest performance.
When resolving a handle into its data, the default behavior
is to return all the data types. Certain applications
may want to specify which data type is returned. In order to
facilitate this selection, we've added an options field
to the handle string. Handle URN options are formatted to mimic
FTP URL option formatting. The URL string is followed by a
semicolon followed by a
key=
value pair. To specify
NLM data from the handle server for MEDLINE records,
the handle URN would be:
hdl://nlm.nlm_test/10101;type=nlm
Grail is an extensible, freely available internet browser written entirly in the Python language. Once Grail is installed on your machine, further validation of readiness can be attempted by trying some embedded handles by opening the file $KOSROOT/demos/medline/medline_handles.html from Grail. This page contains arbitrary MEDLINE handles formatted as HTML hyperlinks in the page. Click on them to retrieve the data. In addition the data could be pulled nearly transparently from the primitive repository, using the handle system as a pure name resolution service, or directly from the handle system itself. The only detectable difference is that if a handle resolved to an HTTP URL, that URL string remains in the URL Entry field in Grail. If the record was a direct handle resolution the Handle string remained in the URL entry field. The following three figures illustrate this mechanism.
![]() |
![]() |
In the previous example, notice that the URL in the URL entry field is an hdl: URN. This implies that the resolution stopped at the handle server. Figure 3 shows the same record, but the URL in the URL entry field is an http: URL. In this case, the URL data retrieved from the handle system was resolved.
![]() |
Neither the primitive repository nor the handle system has searching capabilities. To retrieve a record from either place, you must have its handle (or at least its URL in the case of the primitive repository). In order to facilitate searching in the MEDLINE record data space while still allowing storage and retrieval to come from either the repository or the handle server, another tool was needed.
We developed a Grail applet that allows a user to search NLM's Elhill mainframe computers using an ordinary MEDLARS account. This applet, called Grail Med, uses a telnet connection over the Internet to medlars.nlm.nih.gov and Don Libes's Expect fucntionality wrapped in a telnet socket (implemented as Python modules) by Scott Hassan as part of the Stanford Digital Library Initiative. The Telnet-Expect software allows a KOS plugin to automate an Elhill session programatically. The Grail Med applet uses the Tk toolkit by John Ousterhout of Sun Microsystems, as the widget toolkit for the GUI applications. The applet is loaded by Grail upon reading the following mark-up in an HTML page:
<OBJECT CLASSID=Controller CODEBASE=Grail Med.py> </OBJECT>
Figure 4 is a screen snapshot of the applet controls and a view of the applet after some search terms have been entered.
![]() |
The applet presents a series of entry fields: Title, Author, Subject, etc., that accept search criteria from the user. The interface allows for terms to be considered individually or together. If a user typed Lindberg D in the Author field and computer in the Subject field the terms would be grouped and the Elhill search would be formulated such that only records which meet both criteria are chosen. With Grail Med, the search is always done on Elhill against the MEDLINE data set.
Above the Launcher button there are two radio-buttons (Knowbot and Direct). These buttons determine if the Elhill connection will take place directly from the applet or inderectly via a Knowbot Service Station, presumably at a server closer to or co-located with the Elhill repository.
Once the search terms are entered in the appropriate fields, the Grail Med user clicks the Launcher button. Before bringing up the launcher window, this action brings up a new window designed to obtain a valid MEDLARS username and password from the user; see Figure 5. The password entry displays asterisks in place of the actual characters associated with the keys being pressed to avoid revealing the password to an onlooker.
![]() |
The rationale for using KPs to carry out database searches is based on the premise of moving the search intelligence closer to the data. This allows more efficient use of network bandwidth compared to moving large amounts of data across the network to be discarded by a filter at the client.
KPs provide a database or repository independent platform for connecting to arbitrary services. An Elhill communications module was implemented as a KOS plugin. The interface for MEDLINE searches, full- and short-record retrievals, is described by an ISL specification. See the file $KOSROOT/interfaces/ElhillController.isl in the source tree.
By pressing OK once the MEDLARS account information has been properly entered, the Grail Med applet will bring up a new interface for the purpose of constructing, editing, maintaining, launching and receiving information about search and retrieval via KPs. This interface is shown in Figure 6.
![]() |
The launcher window is divided into three sections. At the top is a text field labeled Status Buffer. This is the area where normal reporting station output appears, highlighted in blue. Error conditions, including KP tracebacks, are highlighted in red. Finally, all output written to the KPs standard output will be displayed in black.
The bottom section of the launch panel is the edit window. This is the area which displays and maintains the state of the various modules comprising the KP. In this case, the KP is comprised of five modules:
KPMain - Provides the entry point for a generic Knowbot Program. It calls into the KPSubmit and KPExecute modules to initiate the appropriate type of submission and execution.
KPSearch - Provides the functions which will conduct the search and store the results in the KP's suitcase for return.
KPExecute - Handle the scheduling of when the KP should run, immediately or based on a trigger.
KPSubmit - Handle the method of submission. In addition, locate the Elhill plugin and migrate there if necessary.
KPVars - The module where all of the non-boilerplate state is stored. Examples are: login information, search terms, execution type, submission type, and Elhill interface information.
The center portion of the launcher panel contains the configuration controls. Manipulation of these controls rewrites part of the information stored in the KPVars module. A summary of the controls follows:
Knowbot Service Station: The KSS that will serve as the first stopping point for the KP.
Execute: Either Immediate or Trigger are possible here. If Trigger is selected, the Trigger Variable Name field will indicate the trigger variable the KP will wait for before conducting the search.
Edit Module: This is a menu button that when pressed reveals a menu of Modules that are the parts of the KP. Selecting any of these modules brings that module into the edit buffer.
Pressing the Submit KP button on the very bottom of the launcher panel will conduct the same search we demonstrated in a previous section only the details of the search properties and the results of the search are transported and represented in a KP. Figure 7 shows typical status output printed during a search.
![]() |
Once the search is complete, the launch window can be iconified or closed. The results of the search are formatted in the original Grail Med window. How they are formatted for viewing depends on the retrieval method chosen on the Grail Med applet controls.
Retrieval sources have their own set of controls divided into two major groups: Handle and Elhill. The Elhill method retrieves the first ten records into the applet and the user can browse the records using the arrow buttons. Once the records are pulled over, they are in the applet's memory and no further resolution is needed to view the records. It should be mentioned that the handle retrieval method only retrives the unique identifier and title fields from Elhill. The remainder of the fields are retrieved via the handle system. To get the full records from Elhill, you'll have to resubmit the search with Elhill selected as your retrieval source. If you don't, the records will appear quite sparse.
![]() |
Viewing records that were retrieved from Elhill via the applet is straightforward. The records are retrieved, ten at a time, and stored in applet memory. Viewing them is a simple matter of clicking on the forward and backward arrows (see Figure 9, below the Elhill radiobutton on either side of the Cache check box). Records are formatted and displayed on the same page as the applet.
![]() |
Retrievals using the Handle method are given the further choice of selecting a Repository retrieval or a Direct retrieval. Figure 8 illustrates that the summary page becomes filled with hyperlinks pointing to handles once Handle is selected. Further, the anchors themselves get dynamically rewritten to specify type=URL for repository retrievals and type=NLM for handle server retrievals. Clearly this many choices presented to a typical user is potentially confusing. The intent is to demonstrate the flexibility of advanced infrastructure tools applied to assist in migrating away from expensive legacy technology.
If a search is performed from the start with Handle selected, only the Unique Identifier and Title fields are retrieved from Elhill. These are the only pieces of information needed to construct the handle and present a meaningful summary.
If clicking on one of these anchors, results in a record not being found, it probably is due to the fact that there are only 500,000 MEDLINE records loaded in the handle server at CNRI for the purposes of this demo.