In this blog post, guest author Veronika Heimsbakk, knowledge graph lead at Capgemini, shares her approach to creating semantic knowledge models for clients. Read this guide to learn how she works together with clients to build semantic knowledge models from the ground up and discover practices you can apply to your own semantic modeling initiatives.
How to approach building semantic knowledge models for clients
When attending The Knowledge Graph Conference 2024, I was on a panel with Peter Haase, founder and Chief Scientific Officer at metaphacts, where we discussed ontologies and how to model them. I was then asked by metaphacts to contribute to their blog, elaborating on my methodology for semantic modeling. Here, I will focus on the steps I usually follow when working with my clients in creating a wonder of semantic knowledge graphs.
It is worth mentioning that I don’t follow one specific methodology for ontology modeling, but build on various approaches, as the maturity and needs of my clients vary a lot. This blog post will describe the general steps and building blocks of my daily work, however, my colleagues in Capgemini might have different approaches.
Table of contents
Prerequisites: Before you start semantic modeling
Before you get started with modeling the actual information, there are a few things you must first discover. These steps are done either together with the client, or the client themselves will provide this information. At this point, I am assuming that the use case and/or user story already is in place.
-
Identify the scope
-
Identify information sources
-
Timeframe and milestones
Identify the scope
Your client needs to identify a scope which the knowledge graph will serve. It is its own science to discover that small (enough) scope with high (enough) value to show the wonders of what knowledge graphs can do for their use case(s). However, since knowledge graphs are a dynamic model, with easy and almost painless scalability, the scope to begin with can be very small.
Imagine that the use case is creating a portal for CVs and projects within a consultancy firm, a starting scope could then be all employees and projects from one particular department, from which you can then add more departments later on. During this phase, you also need to start thinking about granularity. Knowledge graphs are the atomic breakdown of data, but what level of granularity do you really need for your particular project? Keep it simple. This question of granularity will be repeated several times through the development of an ontology.
Identify information sources
Next, one needs to discover what information sources are there, and what they look like. Do we have access to the information? Are there third-party sources to take into account? Information sources vary a lot. They can be Word files, Excel spreadsheets, SQL databases, data streams, PDF files, JSON APIs, it can be basically whatever type of file. As a knowledge engineer, you must be prepared for unstructured data, messy data, chaotic data and data with low quality in any shape or form. Our job is to parse, transform and create order and integrity in the client’s data.
Timeframe and milestones
Together with your client, plan the desired milestones and plot them along the project’s timeline. Ontology modeling is iterative work, and there is a need for close cooperation with client domain experts in order to establish that semantic knowledge layer, which an ontology serves as. The time available must be taken into account when deciding on a scope too, a Minimal Viable Product (MVP) or Proof of Concept (PoC) of 8–20 weeks will have different milestones and phases than a full-scale project.
Initial activities together with the client
Before you are on your own, diving into your favorite modeling tool and mapping all that crazy data, there are a few activities you need to do together with your client. These activities will help and guide you a lot along the way.
Competency Questions
When the prerequisites are in place, one can start defining a list of competency questions. These are natural language questions that you want to answer using the ontology. They usually translate well into SPARQL queries for testing the knowledge graph later. A list of competency questions will serve as a useful guide through the modeling process and is immensely helpful in testing the knowledge graph.
Going back to our previous example, some competency questions can be: “How many Azure projects has Lisa worked on for the past three years?”, “How many PRINCE2 certified senior architects do we have?”, “We need KPI statistics of all AWS related projects.”, and so on. The more, the better!
Finding these questions is an activity to do together with your client.
Draw initial concepts
Here comes one of my favorite activities that comes with this job—picking my client’s brain for knowledge, and drawing them on paper together! I typically shape up this activity as a 2-4 hour workshop, and the tools of the trade are A3 sheets of paper and a box of coloring pencils.
We start with the use case. What does it contain? What concepts are there and what do they mean? Does this concept have a relation to other concepts? And does it contain “sub-concepts”?
Your job is to ask those silly questions, to get the client's knowledge out of their heads and down on paper. If it’s a bit tricky to get started, you can ask your client to write down relevant words on Post-its, group them into categories and then start drawing.
Start your own modeling process
Now it is time to get started producing with modeling. At this point you have several useful tools at your hand; the drawings, competency questions, and information sources.
There are also several different approaches to how you will start this phase of the work.
-
It might be that some of the information sources contain machine-readable schemas or similar that you want to parse to RDF in order to get a skeleton to start with.
-
It might be that your client is already familiar with RDF and wants to contribute to the work. In that case, setting up a collaborative infrastructure is important. As a knowledge engineer, you need to know the vendor landscape well to make the best recommendations of tooling and databases for your client to serve their needs.
-
It might be that you have pretty much nothing to go off of, other than the clues you have already gathered through the prerequisites and the drawing workshop.
In any case, an ontology will take shape.
Considering already established ontologies
There are a lot of available and well-maintained ontologies on the web, waiting for you to reuse their knowledge. Are there any external ontologies or vocabularies you can use for your client use case?
I work a lot with clients in the public sector. In Norway, we have our National Data Catalogue, which is based on DCAT-AP-NO, SKOS-AP-NO and surrounding RDF specifications. For most of my projects, I would reuse RDF resources from them to describe my client’s knowledge.
Discovering the need for axioms and/or constraints
In some cases, there might be a case for reasoning capabilities. The ones I stumble across the most are reasoning classification of entities, but in a few cases, I have also stumbled across reasoning permutations or combinations of entities based on various metrics. If you do need to describe axioms and restrictions for reasoning purposes, make sure you have a tool that allows you to do so. Even though I don’t always have the need for axioms in the ontology, I tend to leverage RDFS reasoning for all my ontologies. This is to ensure that the semantics of the ontology match what is expected for my instance data, and that I don’t have any surprise results coming from the way the model is constructed.
The need for constraints as SHACL shapes, however, is a way more applicant topic these days. Being able to validate data under a closed-world assumption is often much closer to real-life cases than living within an open-world assumption, in my experience. But I still model the ontology and shapes separately, often the ontology first, and then reuse those RDF resources into the SHACL shape descriptions.
[Image: Example of classification axioms for hydropower plants, and then classifying individuals to their proper group by max output.]
Populate with instances
For most use cases, the client wants to use the ontology to drive data-driven decision-making, analytics and insight into their data. In those cases, a need for populating a knowledge graph with instance data (individuals) becomes apparent.
In this case, you do not want to add those manually. There are many mapping tools available on the market, or you can use programming frameworks for RDF to map the individuals according to the ontology.
Back to start
Once the graph contains individuals and their attributes and relationships, together with the knowledge layer–the ontology–you have a knowledge graph. And you can start testing and evaluating according to the established competency questions.
As mentioned, most competency questions are easily translatable into SPARQL queries, which makes it easy to answer them. Bring your answers, preferably visualized through a knowledge graph visualization tool, back to your client and evaluate the quality and accuracy together. Did we find what we wanted? Is there new insight gained? Did we miss something? Are there inconsistencies in client data?
About the author
Veronika Heimsbakk
Knowledge Graph Lead
Insights & Data, Capgemini
Veronika is a dedicated and enthusiastic outreach of the wonders of semantic knowledge graphs. Recently awarded among Norway’s Top 50 Women in Tech. Her main niche within the space is everything SHACL.
Try it yourself with metaphactory
In our experience at metaphacts and similar to what Veronika describes in her blog post, enterprises do not follow a single, commonly established methodology, but often draw on established best practices to craft their own approach that fits to their specific needs and use cases. This is why, with our platform metaphactory, we try to find the right balance between established methodology and flexibility, by supporting what we see as commonly used best practices out-of-the-box and at the same time allowing for flexible configurations to support variations in methodological approaches.
Veronika highlights that involving the client (end user) in the modeling process is extremely important and that modeling must happen iteratively. These aspects are both at the very core of our process in metaphactory, where we support the capturing and organizing of domain expertise in explicit semantic models using a collaborative, visual environment and allowing multiple stakeholders with diverse technology backgrounds to contribute, provide feedback and revise models step by step. Additionally, because we agree to the usefulness of SHACL as described by Veronika, metaphactory's visual notation is grounded in SHACL.
If you're interested in exploring how you can build a semantic model collaboratively (within your organization or with external clients) with metaphactory, have a look at our trial options here.