Vocabulary management for domain experts and business users with metaphactory

Semantic Knowledge Modeling

Linn Aung

·

·

Reading time: 5 - 10 minutes

Vocabulary management for domain experts and business users with metaphactory

This blog post introduces metaphactory's vocabulary management features, which extend the platform's knowledge modeling capabilities and support knowledge graph experts, domain experts and business users in creating and editing SKOS vocabularies to capture business-relevant terms. We'll start out by defining what vocabularies are and looking at the use cases they can serve. Then, we'll look at specific vocabulary management features supported in metaphactory. Finally, we'll look at a specific use case and integrate an existing thesaurus into metaphactory and use the platform's semantic structured search component to explore terms and to connect data through relations between entities.

What is a Vocabulary?

To put it briefly, a vocabulary is a collection of terms organized in a (hierarchical) classification scheme. A term could include preferred and alternative labels and has a defined scope or describes a specific domain. The most common examples of different types of vocabularies are thesauri, taxonomies, terminologies, glossaries, classification schemes, and subject headings.

Need for Vocabularies

Vocabularies are used in a variety of contexts and can serve various purposes. Let's look at a few examples of using vocabularies in practice. Imagine you are a data scientist in the pharma domain and are working on a project where you or your end users need to be able to:

  • Build categorizations and classifications of terms such as genes, proteins, diseases, viruses, symptoms, living organisms, etc.
  • Tag particular data contents or data resources such as research papers, case studies, lab reports, web pages, images, blog posts with these terms as related terms, in order to give a base for recommendation and to promote the interoperability and relations between those tagged data resources.
  • Search for information about a protein and narrow down your search to a specific species, organ, or other facet of the subject.
  • Search multilingually and expand your search to synonyms or alternative labels when looking for information about a specific disease.
  • Use related terms and auto-suggestion when searching for information.
  • Organize internal knowledge that needs to be re-used and shared within or outside of your organization in a controlled manner and make sure that it doesn't create confusion when working within and across different disciplines, for example, when a particular term is used in different ways or different terms are used for similar things.

Because they allow for defining a common terminology and hierarchical structure for terms within a domain, vocabularies support all examples discussed above and help users find information faster, more reliably, and more effectively and communicate and collaborate across disciplines, departments, or organizations.

What exactly is Vocabulary Management?

In brief, vocabulary management refers to the design, development, implementation, and maintenance of such taxonomies, classification schemes, thesauri, metadata, and glossaries. It also implies the process of keeping vocabularies in line with changes in term usage and adopting best practices which are based on open standards and available to all users.

One practice for this is SKOS (Simple Knowledge Organization System). SKOS is based on RDF which allows the information to be passed between machines in an interoperable way. For more details, please see the informative guide on SKOS Primer.

 

Vocabulary Management with metaphactory

In metaphactory, we have adopted SKOS as our base model for vocabulary management and have implemented features such as vocabulary cataloging, versioning, and metadata editing capabilities. With these features, metaphactory allows customers to manage vocabularies in the knowledge graph itself, alongside instance data and other data assets (such as ontologies and data catalogs), and enhances knowledge democratization by supporting the modeling of domain-specific data in terms that domain experts and business users understand and can use for analysis and in answering critical business questions.

In the screenshot below you can see the STW thesaurus for economics loaded as a sample vocabulary in metaphactory. Note how the terms are structured for organizing and classifying data and are related to each other via hierarchical relations. Additionally, each term is used to define and distinguish the characteristics of knowledge resources in a specific domain.

Overview of STW thesaurus for economics

Overview of STW thesaurus for economics

For example, in the screenshot below we see the term Business intelligence system, which has a child relation to several other terms, i.e., it is narrower in meaning than other terms, such as Information system and Corporate information systems, and has data descriptors, such as preferred and alternative labels in multiple languages.

Term retrieval of keyword 'Business intelligence system' via a simple search

Term retrieval of keyword "Business intelligence system" via a simple search

metaphactory's vocabulary management follows four simple steps throughout which domain experts and business users can seamlessly contribute to the vocabulary engineering process:

1. Creation of terms: Terms are created to represent the core concepts of the domain modeled.

New terms can be created either as a single term or through batch terms creation. (metaphactory's vocabulary management provides a convenient way to copy and paste text from different sources by choosing delimiter options.)

In this example, we're showing both options for creating a new term called "Knowledge democratization platform" that has a parent relation to the term Business intelligence system.

Creation of the term 'Knowledge democratization platform' under 'Business intelligence system'

Creation of the term "Knowledge democratization platform" under "Business intelligence system"

Batch creation of terms, 'Knowledge democratization platform' under 'Business intelligence system'

Batch creation of terms, "Knowledge democratization platform" under "Business intelligence system"

2. Definition of semantic relations between terms: Semantic relations describe how two terms are connected to each other.

Defining and establishing a semantic relation between terms (e.g., skos:broader to assert that one term is a child term of another or skos:narrower to assert that one term is the parent term of another) can be achieved either through tree-browse or by searching the term that should be added to the hierarchy within a vocabulary.

Editing a term hierarchy

Editing a term hierarchy

3. Definition of properties per term: Properties define additional information about an instance of a term, e.g, the definition of the term Business intelligence system.

Editing properties of a term

Editing properties of a term

4. Deprecation and deletion of leaf terms: Leaf terms (terms with no narrower semantic relations) that are no longer required to maintain or use can be deprecated or permanently deleted. In contrast to permanently deleting a term, when a term is deprecated, its term structure is kept intact so as to not affect existing applications.

In addition to the vocabulary engineering process, metaphactory also provides collaboration features and versioning capabilities (see screenshot below).

Vocabulary versioning using Git

Vocabulary versioning using Git

 

Vocabulary Management in Practice - Use Case Demo

As discussed above, the need for vocabulary management is varied and use cases can be heterogeneous. In this blog post, we'll focus on one specific use case to demonstrate the value of vocabularies by integrating an existing thesaurus, the STW thesaurus for economics, into metaphactory and using the platform's semantic structured search component to explore terms and to connect data through relations between entities.

For our simple use case demo purpose, we will use the Nobel Prize Ontology. Please note that the Nobel Prize Ontology has been augmented by metaphacts with SHACL shapes.

Visual representation of the Nobel Prize Ontology at schema level

Visual representation of the Nobel Prize Ontology at schema level

The screenshot of the Nobel Prize Ontology above emphasizes the relations between the following entities:

  • Laureate Award Class
  • Laureate Class
  • Nobel Prize Category Class
  • Field Class* and
  • SKOS Concept Class

If we start by looking at the Laureate Award Class, we see that it has a laureate entity relation to the Laureate Class, a field entity relation to the Field Class, and a category entity relation to Nobel Prize Category Class.

*Note, the Field Class was added in our extension to the Nobel Prize ontology in order to provide additional context information for the Laureate Award Class and is a subclass of SKOS's Concept Class.

In this demo example, we want to augment the information about Laureate Awards by associating the specific fields they relate to.

Let's take, for example, the Laureate Award received by the Laureate Muhammad Yunus in the Peace category in 2006. Muhammad Yunus is known for "creating economic and social development from below" in the field of "micro-credit" or "microfinance". Our aim is to document the fact that the Peace Laureate Award he received in 2006 is in the field of Microfinance, a term re-used from the STW thesaurus for economics.

To add such information about laureate awards, we create a simple data entry form using metaphactory's semantic form component configured as shown below:

<semantic-form for-class="http://data.nobelprize.org/terms/LaureateAward"
  new-subject-template="http://example.com/records/{{UUID}}"
  fields-to-load='[
    "http://data.nobelprize.org/terms/laureate",
    "http://data.nobelprize.org/terms/category",
    "http://data.nobelprize.org/terms/field"
  ]'>
</semantic-form>

Since we are creating a form to augment information about Laureate Awards, we make sure to use the Laureate Award Class IRI for the "for-class" parameter and the "new-subject-template" parameter to generate an IRI for the new resource.

The "fields-to-load" parameter allows us to specify a list of relation IRIs (laureate, category and field) to be loaded asynchronously and can be used to control the order in which fields for a particular class or instance are displayed, as well as to hide fields. The Laureate Award Class has several relations (filtering university) and properties (filtering share, year etc.) and we would like to make sure to filter and render only those relevant to our use case.

Visual data curation for the Laureate Award Peace 2006, Muhammad Yunus, using the semantic form component

Visual data curation for the Laureate Award Peace 2006, Muhammad Yunus, using the semantic form component

As a next step, we want to explore Laureates (Persons or Organisations) and find those who received a Laureate Award in the Peace category and in the field of Microfinance. Please note that the term Microfinance is re-used vocabulary from the STW thesaurus for economics. For this, we will use the metaphactory structured search component that allows us to define specific queries through an intuitive, visual end-user interface.

Semantic structured search for Laureates who received a Laureate Award in the Peace category and the Microfinance field

Semantic structured search for Laureates who received a Laureate Award in the Peace category and the Microfinance field

If we now try performing a search for Laureates who received a Laureate Award in the Peace category and with a field in Financing, assuming Financing is our point of interest, we can narrow down our search to the specific subject microfinance. We are able to do that since the term microfinance has a broader relation to the term Financing.

Semantic structured search for Laureates who received a Laureate Award in the Peace category and the Microfinance field

Semantic structured search for Laureates who received a Laureate Award in the Peace category and the Microfinance field

Summary

In this blog post, we discussed the need for vocabulary management and the process of keeping vocabularies in line with changes in term usage and adopting best practices, i.e., SKOS, which are based on open standards and available to all users.

We demonstrated how metaphactory's vocabulary management follows four simple steps where domain experts and business users can seamlessly contribute to the vocabulary engineering process.

We also presented concrete example use cases, associating a class in an ontology to an existing vocabulary by using some of our platform components such as semantic structured search and semantic form.

How can I try metaphactory?

To try the example described above you can get started with metaphactory today using our 14-day free trial. You can download the Nobel Prize Ontology from our public GIT repository and the STW thesaurus for Economics from the ZBW - Leibniz Information Centre for Economics website.

Don't hesitate to reach out if you want to learn more about how implementing metaphactory and following our approach can also accelerate your knowledge management initiatives and bring you from idea to production in just one month.

Make sure to also subscribe to our newsletter or use the RSS feed to stay tuned to further developments at metaphacts.

Linn Aung

As a Software Engineer at metaphacts and knowledge graph technologies enthusiast, Linn is responsible for developing, documenting, and testing metaphactory to ensure that the platform meets our customers' needs and helps them achieve their business goals.