1.3 Conversion Issues

As much as they are well suited for adaptation to hypertext, converting text to hypertext has been a classic problem while dealing with very large information spaces such as training manuals, encyclopedias, dictionaries. Currently published literature on hypertext contains little work directly related to the scale of transforming large volumes of encyclopedic text into hypertext form (most deal with creating small hypertext documents, not converting large documents to hypertext).

The following are some of the issues involved in converting text to hypertext [Glushko, 1989], [Riner, 1991]:

  1. Identifying documents that would benefit readers if converted to hypertext form.
  2. Determining procedures to convert them to hypertext format.
  3. Preparing documents in an electronic format from paper or other forms.
  4. Identifying nodes and links and classifying them into various types (to capture semantics). An important problem related to this issue is called the fragmentation problem. It is still difficult to identify text units that can be separate modules and also serve as cross-references for other entries. Links should follow some model of the user’s need for information in some particular context. Deciding on the level of granularity is a difficult problem. Too fine the granularity, greater the problem of fragmentation. Too coarse the granularity, greater the need or the display of large entries.

    Also fragmentation tends to make an implicit structure (such as a subtle treatment of a theme that may communicate an idea more artistically) explicit, taking away the expressiveness of the statement. Therefore, we have to find means to reduce segmentation of ideas and loss of structural information due to the manipulation of the semantic structure of a linear document.

  5. Determining the target of a link as a complete entry, a sub entry, or a derivative form is a challenging task. This involves determining the right part of speech, the etymological root, and applying sense-disambiguation to identify a particular meaning.
  6. With present-day video monitors, the display of large entries in their entirety is still a problem. This can be partly solved by having fisheye views and abbreviations. Structural information can be extracted from the tags and employed in the construction of a structural view.
  7. Performing the conversion and verifying the results.