BIJLAGE - Conceptcollecties en hiërarchie

Uit TOOI-thesauri worden waardelijsten gegenereerd. Sommige van deze waardelijsten worden gegenereerd op basis van een skos:Collection. Een dergelijke collectie kan een hiërarchische opbouw hebben die niet overeenkomt met de hiërarchie in de thesaurus en is dan ook niet gebaseerd op skos:broader. Het hiernavolgende beschrijft (in algemene zin) de functionele vraag die we hiermee invullen en de aanpak die we voor deze gevallen hanteren.

NESTED COLLECTIONS AND CONCEPTS

At KOOP, the office for official government publications in the Netherlands, we are working on metadata models, thesauri and other instruments to make information more useful. W3C’s SKOS recommendation lies at the heart of much of what we do, including the less-often used features it makes available. In this blog post I discuss an interesting problem related to concept collections.

The problem recurs in several of our thesauri. Some of our users are in need of lists containing a hierarchy of terms that is different from the concept hierarchy. Crucially, they need the terms with the concept URIs as minted in the thesaurus.

Therefore, we need to generate a view on the thesaurus, tailored to the user’s need. Technically, both the thesaurus and the view are RDF graphs. The thesaurus as well as the definition of the views are maintained by editors (or rather, data stewards). These are business users of a standard SKOS-tool. The definition of a view must not be buried in program code or SPARQL queries. Put differently, the recipe for generating views must be generic, without instructions that are specific for a specific user.

To this end, SKOS introduces the notion of collection. A collection is, quite simply, a set of concepts taken from the thesaurus in order to group them for specific uses. Suppose a customer needs a view that consists of three concepts. We create a collection object in the thesaurus, use the skos:member-relation to assert membership for each of the three concepts, et voilà. We can simply select and copy the relevant statements to a separate graph, serialize it, and send it to the customer.

To explain this more clearly, we need an example. To avoid digressions into domain-specific details, let me define a simple thesaurus to illustrate. Like the familiar example with milk (cow milk, goat milk, buffalo milk) in the SKOS Primer, it contains information about beverages, but adds a bit more structure. This extra structure will help us analyse the problem at hand.

beverage
  beer
    triple
    lager
  herbal infusion
    coffee
    lemon grass
    mint
    tea
      Darjeeling
      Earl grey
      oolong
  soft drink
    cola
    root beer

These are all concepts, indentation means “narrower”. Now suppose that Stella and Sandeep each own a little cafeteria. Their customers use an app to browse the menu and place orders. The thesaurus is owned and maintained by a separate organisation and published on the web. The owner of the cafeteria decides which items show up in the app. Crucially, the items in the app are the same concepts as the ones defined in the thesaurus. This is important, because the thesaurus provides useful additional information: about allergens contained in the item, age restrictions on consuming it, and so on.

Stella has a simple menu for beverages: coffee, tea, cola. That’s it. To achieve this, we define a collection called “Stella’s drink items” and add the three concepts to it. The SKOS Primer describes in detail how the pertinent RDF statements are structured. Indentation in this example means “member”, not “narrower”, while the angled brackets indicate being a collection:

<Stella’s-drink-items>
  tea
  coffee
  cola

Sandeep has a need for more structure. He would like to see a hierarchical list in the app, one in which a node labelled “beer” expands into lager and root beer. Sandeep knows full well that root beer is not beer at all, but his customers see things differently. Alternative facts like these pose no problem, since SKOS allows collections to be members of collections, alongside concepts. These can be used to introduce an “alternative” hierarchical structure without asserting untrue facts. The app can still look up extra information about the actual concepts on the web. Thus, Sandeep can rely on the app to check age when a young-looking person orders items in the beer collection whenever applicable. Root beer is for all ages, lager is for 18 years and older.

<Sandeep’s drink menu>
  <Sandeep’s-beer-group>
    lager
    root beer
  tea
  cola

At this point we are all set to introduce the dilemma. Suppose Stella feels a need to expand the menu, so that the item tea is replaced by a collection as follows. She wants the collection to carry the display label “tea”. Moreover, she wants it to link directly to the concept with the same label. When a customer touches the item, a definition of “tea” as found in de thesaurus on the web should pop up. This relation between the collection and the concept is what the colon is intended to convey:

<Stella’s-drink-menu>
  <Stella’s-tea-group> : tea
    Darjeeling
    lemon grass
    oolong
  coffee
  cola

One may object that the definition provided by the thesaurus — tea is a beverage made from Camellia sinensis — is not applicable to the item lemon grass occurring beneath it, but this retort leaves Stella unfazed: it is how she wants her menu to be structured. Can we express this without introducing something new, as the colon suggests? In theory, we could introduce a convention like so:

<Stella’s-drink-menu>
  <Stella’s-tea-group>
    tea
    <expansion-of-tea>
      Darjeeling
      lemon grass
      oolong
  coffee
  cola

The convention would state that whenever exactly one concept and one collection are member of the same parent collection, then the child collection is an expansion of the concept. There are several deep problems with such an approach:

  • The convention introduces out-of-band procedural interpretation rules that are supposed to be hard-coded in programs and apps. Instead, we should prefer a declarative approach
  • Suppose Sandeep removes cola from his cafeteria’s menu. The convention would imply that lager and root beer are now an expansion of tea, which is of course not at all what Sandeep intends. Thus, we must add more complexity to be able to “escape” the convention
  • In the process of defining the convention, we in fact reinterpret the original W3C specification. Not a best practice

A variant of this convention would be to use identical prefLabels to signal that the relation holds. This variant breaks down for the same reasons. A better alternative is to define an object property that takes collections as subject and concepts as object. We could call it, say, ex:expands, and define it to mean “the collection x corresponds to concept y, in that it inherits the labels, notes (including definitions) and other information from concept y, excepting what SKOS calls 'semantic relations'.”

By default, concept y is not a member of the collection x that expands it. On the other hand, this is not a requirement. Stella can have her collection expand “tea”, and at the same time list tea as one of the elements beneath it. It stands to reason, though, to require that ex:expands has at most one value. It is unclear what it would mean for a collection to expand more than one concept simultaneously.

In generating a customer specific view on the thesaurus, one must include, besides the transitive closure of skos:member`` starting from the root collection, all pertinent statements with ex:expands. In addition, one probably wants to include label statements, and so on. In any case, the recipe for generating customer specific views can be fully generic, as required.