The future of libraries and linked data: How the National Library Board of Singapore modernized its data management

Use Cases

Pauline Leoncio

·

·

Reading time: 9 minutes

the future of libraries and linked data

the future of libraries and linked data square

 

In this blog post, we'll discuss the powerful knowledge graph-based solution that transformed NLB's library and resource management, and how you, too, can leverage these tools to support your organization's data-driven use case! 

 

The future of libraries and linked data: How the National Library Board of Singapore modernized its data management

 

If you're grappling with messy, heterogeneous data trapped in silos, know you're not alone. These are just a few data challenges many large enterprises and organizations across all industries are familiar with. Often, it's a result of having an abundance of data and lacking the time and/or technology to organize and leverage the knowledge kept within these internal systems. But unlike many years ago, now there's so much more we can do with data that was previously inconceivable.

 

Imagine being able to link data—no matter how diverse in kind or disparate the systems—and manage it from a singular, all-encompassing platform. It may sound too good to be true, but that’s precisely what the National Library Board of Singapore (NLB) accomplished when it set out to modernize its data architecture and make data-driven improvements to its services. 

 

In this blog post, we'll discuss the powerful knowledge graph-based solution that transformed NLB's library and resource management, and how you, too, can leverage these tools to support your organization's data-driven use case!  

 

Table of contents 

 

Turning the page to a new library management system  

 

As with many organizations that deal with copious amounts of data, it’s natural for duplications and inconsistencies to emerge when there's no central hub to access and oversee these resources. And the NLB was no exception, they faced this exact problem when they sought a solution to improve the curation and management process for data curators. 

 

The libraries and archives managed by the NLB—consisting of 30 public libraries, the National Archives of Singapore (NAS) and the National Library—all use distinct systems to handle their data, like the Integrated Library System (ILS) and the Content Management System (CMS). In addition to the varying systems, these assets were also managed by separate teams and departments resulting in isolated data islands that prevented curators from accessing each other's resources and library customers from leveraging the information stored in other systems. Naturally, these data silos led to resources that had inconsistencies, missing values and inaccurate information. 

 

A tale of linked data 

Not only did this project emerge as a response to the NLB’s persisting challenges in data management, but also as an evolution of its ongoing interest in adopting a Linked Data approach. Linked Data (also known as Linked Open Data), refers to making data ‘link-able’ across the web, so that humans and machines can easily access it, making it easier to explore and discover other relevant data. Specifically, Linked Data means data that follows W3C Semantic Web standards so that it's machine-interpretable and FAIR—findable, accessible, interoperable, and reusable. Since the NLB’s Discovery Services team has already explored Linked Open Data standards over the years through the implementation of RDF and the publication of library entities through schema.org, they wanted to apply these principles to more areas under its purview. 

 

Linked Data creates a world where information is freely shared and repurposed for numerous applications. According to Tim Berners-Lee (who coined the term “Linked Data” and created the World Wide Web), the four key principles of Linked Data are:

 

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

 

This standardization promotes increased integration online and fosters open knowledge sharing on the Web. It empowers you to leverage publicly available knowledge to supplement and enrich your knowledge base, allowing for more opportunities for discovery and analysis—a capability that is of particular interest to the NLB as a means to add richness and extend the value of its resources. Additionally, being a government agency with an education-focused mission, practicing a Linked Data approach positions NLB as the reigning authority for relevant public data.

 

A novel solution: The Data Management Interface 

With an understanding of the NLB's intent and circumstances, we developed a solution in collaboration with a consortium of partners (including Data Liberate, Ontotext and Kewmann), called the Data Management Interface (DMI). 

 

The DMI is a comprehensive knowledge graph-powered platform that unites the NLB’s existing systems and enables data curators to flexibly collaborate, manage and utilize the entirety of resources within the NLB's catalog (comprising more than three million print and digital library and archive assets) from a central system. Curators are able to easily manage all entities, including their descriptions and relations, from a user-oriented interface. It was built with the GraphDB graph database and metaphactory's semantic visual interface, which applies Linked Data and Semantic Web standards. 

 

The interface was developed rapidly due to metaphactory’s low-code, model-driven application building approach. It includes a user-friendly dashboard overview, multiple search pathways for easy discovery, and numerous functionalities that allow curators to view, validate, edit, add or remove assets. This 360° view helps curators save time, improve efficiency and be able to better detect errors and discrepancies. By creating an overarching platform built on Linked Data principles, the DMI also allows for knowledge creation and sharing among data curators.  

 

Watch as a curator spots, investigates and fixes a problem, verifies the solution and publishes the updated record in the DMI.

 

A data curator's workflow using the Data Management Interface

 

The DMI includes a multitude of features and capabilities such as:

 

  • Easy search and discovery: Curators can explore and discover information via multiple pathways that leverage the underlying semantic model of the graph (e.g., a generic search prompt, statistical dashboard, form-based search screen(s), etc.).
  • Simple knowledge panel: Details about entities (e.g., provenance data) are displayed through a concise card, offering users a snapshot for easy viewing. 
  • Data control: Curators can review and validate records by viewing the source data, with merge and split options to resolve incorrect aggregations.
  • Ontology management: Through the DMI, curators can manage and maintain a tailored ontology based on schema.org and made for the DMI to ensure all entities and relations are defined and exposed. 
  • Flexible modification: Curators are easily able to create, augment, delete, and manage data without affecting source data.
  • Bird's eye view: Data can be analyzed through an aggregated view, including breakdowns by entity type, data sources, records in a modified state, and more.
  • Multi-format publishing: Records are made accessible via the HTML page as multiple RDF serialization formats (following FAIR principles), with the ability to be downloaded or exported into different formats.

Crafting the DMI

A key requirement for the NLB was to ensure that the original sources of data remained untouched, which meant the diverse systems that comprise the NLB’s collection would have to be combined so that entities can be managed from a single, separate platform (the DMI). To make this a reality, we developed the concept of a 'primary entity', a top-level layer in the model that brings these disparate systems together so that data curators can modify, edit, remove or add entities, without altering source data in already established systems. In the aggregation process, these source systems would feed data into the knowledge graph, where the data is then mapped to the knowledge model and displayed in the DMI. 

 

Creating this primary layer supports data curators with:

 

  1. Knowledge democratization and data quality improvements through an integrated system that all users can easily access and modify 
  2. Ability to make edits and keep the record in a modified state, while preserving a history of the record changes or edits so that other parties can review the changes, and once approved can publish the change while leaving a note for why the change was made 

 

The DMI was designed to allow curators to define, on a very fine-grained level, what could be shown to an outer viewer, therefore the primary layer also provides additional advantages to curators in case of driving a public interface such as: 

 

  1. Ability to flexibly modify records while preserving the source data in its original state so that customers can see the most up-to-date information
  2. Control over what is displayed in the interface through a suppression mechanism; curators can suppress entities and properties not advisable for public viewing or hide facts that were previously added that are inaccurate

 

The primary entity layer is what gives data curators unparalleled flexibility and control, making the DMI a remarkably versatile and powerful solution for handling all data curation tasks. 

 

After developing the concept of the ‘primary entity’, we then started the development process and began importing, aggregating and refining entities. 

 

Step 1: The ETL process

We had to use different approaches for extracting and loading data for each of the individual systems in place. For instance, a conversion script was used to convert the output from BIBFRAME to schema.org, which was based on the bibframe2schema W3C community group converter (library). We used custom mapping and conversion scripts to convert Dublin Core metadata to schema.org, which was supplied to us in large XML files.

 

Creating a semantic model based on the schema.org ontology (widely adopted and favored by search engines) introduced a consistent ontology that harmonized the assets and eliminated duplications or discrepancies. 

 

Let's consider a scenario where the field capturing the authorship of a creative work is expressed as ‘creator’ in one data source and ‘author’ in another. 

 

In ILS - Authorship in Marc XML data is expressed in code 100 and looks like the following: 

 

100 1#$aKelleher, James M. Here, the “$a” encodes personal names. We then had to translate this expression to bibframe’s “contributionOf” property.

 

In NAS - In DC terms authorship is expressed as “contributor” as well as “creator”, which should have been carefully disambiguated.

 

In schema.org - Schema.org uses not only “author” and “contributor” properties but also a “creator” one. 

 

As you can see, all of these variations made it challenging to find the right property match. Uniting the datasets through a core ontology made it easy to trace and identify these discrepancies and reconcile them by defining the most accurate and relevant labels. By using schema.org ontology, we were able to capture information in a uniformed and consistent way, which provides more clarity for the data curator. As a result, authorship is described by the following properties:  

 

  1. Author → schema:author

Author = The author of a content or rating

 

  1. Contributor → schema:contributor

Contributor = A secondary contributor to the CreativeWork or Event

 

  1. Creator → schema:creator

Creator = The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.

Step 2: Reconciliation

Reconciliation describes the aggregation and creation of a primary entity around the records being retrieved from different sources. This process consists of: a) identifying the matches or candidates for reconciliation, b) measuring the similarity of such candidates; c) merging the facts about an entity into a single primary entity while maintaining provenance information on each statement regarding its original source.

 

We perform lookups against authoritative data sources like TTE (Thesaurus and Taxonomy Editor), the NLB’s vocabulary management system that stores gold-standard terminology for media types, genres, names of people, places and more. These TTE facts are then ingested into the primary entity. Since we were using the schema.org ontology to model the data, all that was required was to map real-world objects described in NLB's system with schema properties. But because there was so much diversity—in systems and in data—we encountered some technical challenges in the reconciliation process. 

 

All of the NLB's resources were heterogeneous in format, comprising a mix of print, digital and authoritative assets (e.g., ebooks, cassette tapes, terminology), and all of these real-world objects needed to be defined and harmonized through a common ontology. For instance, a work with multiple editions and that has been adapted into several formats (e.g., plays, films and tv shows), such as A Christmas Carol by Charles Dickens, should be linked to the same creative work in the knowledge graph, despite being described by different sets of properties. By tweaking our algorithms and finding the appropriate amount of reconciliation, we were able to successfully pair each object correctly, despite the diversity and huge volume of data. As a result, curators would be able to easily find A Christmas Carol as a ‘CreativeWork’ in the interface and see all of its associated adaptations and editions. 

 

Here’s another example of the kinds of data quality issues we faced:

 

When we merged entities, we noticed that some descriptions, such as date of birth or date of death, lacked specificity and precision. To resolve this, we enabled data curators to suppress and edit records to provide more accurate and precise information.

Step 3: Consolidation

Consolidation is when we merge the reconciled primary entity with manual edits into a single entity that will be rendered for the general public. We used named graphs to help with layering the reconciliation layer, consolidation layer and a concept called the ‘housekeeping’ layer, which stored service technical metadata. Multiple source systems were considered the source of truth, so the structure had to account for automatic and manual updates via all existing systems, without compromising the data. 

 

Contradictions between the systems, such as in facts or information, were resolved by the curator in the DMI. In order to consolidate entities with the same values from different sources, we make sure to record all sources, but in the user interface, we display only the first label using metaphactory's label service.

 

Although technical challenges arose during this process, they were entirely expected and much easier to resolve thanks to our knowledge graph solution, as opposed to using traditional systems. Without the use of semantic standards throughout the entire architecture, this process would have been laborious and time-consuming, let alone costly, at every level—we would have had to build our own very complex data model, develop documentation about it and create a tailored user interface. Additionally, metaphactory facilitated an agile approach to constructing the UI, enabling us to build it with remarkable speed while keeping maintenance costs low.

 

The final chapter: Launching the DMI

The DMI officially launched in December 2022 and is now being used by the NLB's data curators. Two years after embarking on the Linked Data project, the NLB sees a great number of benefits, including:

 

  • A central interface for viewing and managing entities, maintaining quality and consistency across all collections
  • A more robust, expressive, rich and flexible database
  • Data curators are able to save time and increase efficiency and accuracy
  • Knowledge sharing on the web and the opportunity for more discovery 

With the implementation of the Data Management Interface, this two-year project not only accomplished the NLB's ambitious vision but also revealed several unexpected benefits. 

 

Search engine optimization 

 

In addition to supporting data linking and architecture needs, we demonstrated the potential of a semantic knowledge graph in enhancing search engine optimization. A semantic knowledge graph, being built on open standards and offering flexibility and interoperability, enables the publication of metadata to search engines as structured data, leading to improved website SEO. With the adoption of the DMI, the NLB can capture structured data and seamlessly publish it to search engines for indexing and ranking.

 

Opens the door to a world of Linked Data

 

Right now, the NLB is exploring the possibility of enriching its original knowledge graph using data sources and datasets made available on the internet, e.g. Wikidata. Since no one system can store all information in its completeness, by applying the Linked Data approach, the NLB can extend its resource beyond the hosted knowledge and enrich it using external sources. 

 

Easy display customizations 

 

Due to metaphactory's low-code, flexible, and model-driven application-building approach, we were able to develop the DMI rapidly while minimizing time and maintenance costs. Making adjustments or customizations, such as fine-tuning the logic behind the DMI or expanding it to encompass new use cases, is a controlled, predictable, and effortless process when compared to developing custom software.

 

Learn more about metaphactory

metaphactory empowers companies with solutions around knowledge democratization, decision intelligence, and capturing hidden expert knowledge. Our platform follows FAIR data principles and delivers a low-code, model-driven approach to build applications for semantic search, exploration and knowledge discovery.

 

If you want to try our semantic knowledge modeling, register for our free self-guided tutorial and get a four-week trial four-week trial of metaphactory! You’ll receive access to helpful videos, hands-on exercises and the opportunity to practice using your data.

 

Try metaphactory today!

Pauline Leoncio

Pauline Leoncio is an experienced copywriter and content marketer with over 6+ years in marketing. She's developed content plans and creative marketing material for growing B2B and B2C tech companies and covers a range of topics including finance, advanced tech, semantic web, food, art & culture and more.