BIJLAGE - Conceptcollecties en hiërarchie
Uit TOOI-thesauri worden waardelijsten gegenereerd. Sommige van deze waardelijsten
worden gegenereerd op basis van een skos:Collection
. Een dergelijke collectie kan een hiërarchische opbouw hebben die niet overeenkomt
met de hiërarchie in de thesaurus en is dan ook niet gebaseerd op skos:broader
. Het hiernavolgende beschrijft (in algemene zin) de functionele vraag die we hiermee
invullen en de aanpak die we voor deze gevallen hanteren.
NESTED COLLECTIONS AND CONCEPTS
At KOOP, the office for official government publications in the Netherlands, we are working on metadata models, thesauri and other instruments to make information more useful. W3C’s SKOS recommendation lies at the heart of much of what we do, including the less-often used features it makes available. In this blog post I discuss an interesting problem related to concept collections.
The problem recurs in several of our thesauri. Some of our users are in need of lists containing a hierarchy of terms that is different from the concept hierarchy. Crucially, they need the terms with the concept URIs as minted in the thesaurus.
Therefore, we need to generate a view on the thesaurus, tailored to the user’s need. Technically, both the thesaurus and the view are RDF graphs. The thesaurus as well as the definition of the views are maintained by editors (or rather, data stewards). These are business users of a standard SKOS-tool. The definition of a view must not be buried in program code or SPARQL queries. Put differently, the recipe for generating views must be generic, without instructions that are specific for a specific user.
To this end, SKOS introduces the notion of collection. A collection is, quite simply, a set of concepts taken from the thesaurus in order
to group them for specific uses. Suppose a customer needs a view that consists of
three concepts. We create a collection object in the thesaurus, use the skos:member
-relation to assert membership for each of the three concepts, et voilà. We can simply select and copy the relevant statements to a separate graph, serialize
it, and send it to the customer.
To explain this more clearly, we need an example. To avoid digressions into domain-specific details, let me define a simple thesaurus to illustrate. Like the familiar example with milk (cow milk, goat milk, buffalo milk) in the SKOS Primer, it contains information about beverages, but adds a bit more structure. This extra structure will help us analyse the problem at hand.
beverage
beer
triple
lager
herbal infusion
coffee
lemon grass
mint
tea
Darjeeling
Earl grey
oolong
soft drink
cola
root beer
These are all concepts, indentation means “narrower”. Now suppose that Stella and Sandeep each own a little cafeteria. Their customers use an app to browse the menu and place orders. The thesaurus is owned and maintained by a separate organisation and published on the web. The owner of the cafeteria decides which items show up in the app. Crucially, the items in the app are the same concepts as the ones defined in the thesaurus. This is important, because the thesaurus provides useful additional information: about allergens contained in the item, age restrictions on consuming it, and so on.
Stella has a simple menu for beverages: coffee, tea, cola. That’s it. To achieve this, we define a collection called “Stella’s drink items” and add the three concepts to it. The SKOS Primer describes in detail how the pertinent RDF statements are structured. Indentation in this example means “member”, not “narrower”, while the angled brackets indicate being a collection:
<Stella’s-drink-items>
tea
coffee
cola
Sandeep has a need for more structure. He would like to see a hierarchical list in the app, one in which a node labelled “beer” expands into lager and root beer. Sandeep knows full well that root beer is not beer at all, but his customers see things differently. Alternative facts like these pose no problem, since SKOS allows collections to be members of collections, alongside concepts. These can be used to introduce an “alternative” hierarchical structure without asserting untrue facts. The app can still look up extra information about the actual concepts on the web. Thus, Sandeep can rely on the app to check age when a young-looking person orders items in the beer collection whenever applicable. Root beer is for all ages, lager is for 18 years and older.
<Sandeep’s drink menu>
<Sandeep’s-beer-group>
lager
root beer
tea
cola
At this point we are all set to introduce the dilemma. Suppose Stella feels a need to expand the menu, so that the item tea is replaced by a collection as follows. She wants the collection to carry the display label “tea”. Moreover, she wants it to link directly to the concept with the same label. When a customer touches the item, a definition of “tea” as found in de thesaurus on the web should pop up. This relation between the collection and the concept is what the colon is intended to convey:
<Stella’s-drink-menu>
<Stella’s-tea-group> : tea
Darjeeling
lemon grass
oolong
coffee
cola
One may object that the definition provided by the thesaurus — tea is a beverage made from Camellia sinensis — is not applicable to the item lemon grass occurring beneath it, but this retort leaves Stella unfazed: it is how she wants her menu to be structured. Can we express this without introducing something new, as the colon suggests? In theory, we could introduce a convention like so:
<Stella’s-drink-menu>
<Stella’s-tea-group>
tea
<expansion-of-tea>
Darjeeling
lemon grass
oolong
coffee
cola
The convention would state that whenever exactly one concept and one collection are member of the same parent collection, then the child collection is an expansion of the concept. There are several deep problems with such an approach:
- The convention introduces out-of-band procedural interpretation rules that are supposed to be hard-coded in programs and apps. Instead, we should prefer a declarative approach
- Suppose Sandeep removes cola from his cafeteria’s menu. The convention would imply that lager and root beer are now an expansion of tea, which is of course not at all what Sandeep intends. Thus, we must add more complexity to be able to “escape” the convention
- In the process of defining the convention, we in fact reinterpret the original W3C specification. Not a best practice
A variant of this convention would be to use identical prefLabels to signal that the
relation holds. This variant breaks down for the same reasons. A better alternative
is to define an object property that takes collections as subject and concepts as
object. We could call it, say, ex:expands
, and define it to mean “the collection x corresponds to concept y, in that it inherits
the labels, notes (including definitions) and other information from concept y, excepting
what SKOS calls 'semantic relations'.”
By default, concept y is not a member of the collection x that expands it. On the
other hand, this is not a requirement. Stella can have her collection expand “tea”,
and at the same time list tea as one of the elements beneath it. It stands to reason,
though, to require that ex:expands
has at most one value. It is unclear what it would mean for a collection to expand
more than one concept simultaneously.
In generating a customer specific view on the thesaurus, one must include, besides
the transitive closure of skos:member``
starting from the root collection, all pertinent statements with ex:expands
. In addition, one probably wants to include label statements, and so on. In any case,
the recipe for generating customer specific views can be fully generic, as required.