Hypermedia – Content Queries and Indexes
2.1. Content Queries and Indexes
Bruza proposed a two-level architecture for hypertext documents, the top level called hyperindex (containing index information) and the bottom level hyperbase (containing content nodes and links) [Bruza, 1990]. The hyperindex consists of a set of indexes linked together. When an index term describing the required information is found, the objects from the underlying hyperbase are retrieved for examination. Navigating through the hyperindex (not the hyperbase) and retrieving information from the hyperbase is called “Query By Navigation” [Bruza, 1990].
An index is made of a set of index entries. Each index entry consists of a term descriptor or keyword and a locator (like a page number). Term descriptors lack specificity. Term phrases are made of term descriptors thus increasing specificity. However, they may retrieve too many items or no items at all and hence lack exhaustivity. Index expressions provide relationships between term descriptors. Thus, they are more specific than term phrase descriptors. Index expressions have a structure that can be used to derive a lattice of descriptors supporting query by navigation. A base index expression consists of terms that are linked to other terms by connectors. For example, “effective information retrieval” is a base index expression. So is “people in need of information”. The two combined together form an index expression. For example, “effective information retrieval information AND people in need of information”.
The power base expression is a lattice formed out of a full base expression at the top and an empty base expression at the bottom. This lattice (or lattice-like) structure is the basis of the hyperindex [Bruza, 1990]. Based on the vertex of focus in the lattice, the surrounding descriptors can represent enlargements (context extension) or refinements (context contraction) of the context represented by the focus. Thus, the reader can move across the lattice by refining or enlarging the current focus until a focus is found which is relevant to the information required.
Bruza’s measures to determine the effectiveness of index expressions in the hyperindex include:
- Precision: The ratio of relevant objects associated with the descriptor to the total number of objects associated with the descriptor.
- Recall: The ratio of the number of objects associated with the descriptor to the total number of relevant objects.
- Exhaustivity: The degree to which the contents of the objects are reflected in the index expressions.
- Power: The ratio of a descriptor’s specificity to its length.
- Eliminability: The ability to determine the irrelevance of a descriptor and stop the search.
- Clarity: The ability to grasp the intended meaning of the descriptor.
- Predictability: The ability to predict where relevant descriptors can be found in the index.
- Collocation: The extent to which the relevant index terms are near each other in the index.
Experiments and empirical studies are required to determine these retrieval measures for hypertext-based IR systems.