OntologySummit2014: (Track-A) "Common, Reusable Semantic Content" Synthesis

Track Co-champions: MikeBennett, AndreaWesterinen, GaryBergCross

Page Contents

Mission Statement
Approach and Track Plan
Draft Synthesis
References

Mission Statement:

The mission of Track A is to leverage common semantic content to reduce the burden of new, quality ontology creation while avoiding silos of different ontologies.

Semantic technologies such as ontologies and related reasoning play a major role in the Semantic Web and are increasingly being applied to help process and understand information expressed in digital formats. Indeed, the derivation of assured knowledge from the connection of diverse (and linked) data is one of the main themes of Big Data (1). One challenge in these efforts is to build and leverage common semantic content thus reducing the burden of new ontology creation, which is perceived as resource intensive, while avoiding silos of different ontologies. Examples of such content are whole or partial ontologies, ontology modules, ontological patterns and archetypes, and common, conceptual theories related to ontologies and their fit to the real world. However, crafting of whole or partial common semantic content via logical union, assembly, extension, specialization, integration, alignment and adaptation has long presented challenges. Achieving commonality and reuse in a timely manner and with manageable resources remain key ingredients for practical development of quality and interoperable ontologies.

Despite development of such things as foundational top-level ontologies (such as DOLCE (2) & BFO (3)), and the availability of broad domain models (such as SWEET (4)) as starting points, the amount of reuse seems quite low in practice.

This track discussed the reuse problem and explore possible solutions. Among these are practical issues like the use of ontology repositories (5) and tools, and the possibility of using basic and common semantic content in smaller, more accessible and easily understandable pieces. The goal is to identify exemplary content and also define the related information to enable use/reuse in semantic applications and services. A secondary goal is to highlight where more work is needed and enable the applied ontology community to further develop semantic content and its related information.

Results of our discussions have been synthesized in order to be useful to both the ontology and Big Data communities.

Approach and Track Plan

As part of our effort we:

Enlisted 6 speakers and the community to discuss reuse issues and problems, and present their efforts and experiences to address these. Some specific examples of reuse and common content were presented. The goal was to have a diverse set of speakers from both the research and application arenas, to stimulate email discussion within the broader Ontology Summit community. Links to the introductory slide set and the presentations from the 6 speakers are listed here:
- Dr. GaryBergCross (SOCoP) - "Use and Reuse of Semantic Content: The Problems and Efforts to Address Them - An Introduction"
- Professor PascalHitzler (Wright State U) - "Towards ontology patterns for ocean science repository integration"
- Ms. AndreaWesterinen (Nine Points Solutions) - "Reuse of Content from ISO 15926 and FIBO"
- Ms. MeganKatsumi & Professor MichaelGruninger (U of Toronto) - "Reasoning about Events on the Semantic Web"
- Dr. JohnSowa (VivoMind Intelligence) - "Historical Perspectives: On Problems of Knowledge Sharing"
- Professor MichelDumontier (Stanford BMIR) - "Tactical Formalization of Linked Open Data"
- Mr. KingsleyIdehen (OpenLink Software) - "Ontology Driven Data Integration & Big Linked Open Data"
Referenced past Ontology Summits (for example, the Ontology Repositories discussions) as well as connecting to other tracks in this Summit (e.g. discussing tools and heterogeneity, aka variety, issues)and bottlenecks.
Promoted discussion of track session topics on the Ontolog / Summit forum both before and after sessions and leading up to the face-to-face meeting.
Organized a Hackathon reuse exercise and participated in the "reuse ontology" Hackathon event. These two projects are discussed further (along with details on where more information can be obtained) on their respective wiki pages:
- Ontology Design Patterns and Semantic Abstractions in the area of travel risk
- Development of the VOCREF Vocabulary and Ontology of reuse characteristics
Worked with our speakers and the attending community to distill the virtual meeting topics to a useful summary and set of speakers for the face-to-face Symposium. Track A's two synthesis presentations are:
- Synthesis-I presentation
- Synthesis-II presentation
Helped add to the final communique.

Synthesis, April 10 2014

Reuse of semantic content can be defined as the ability to include content from one source in another, or simply to be inspired by the content in a source. The reuse may directly align with the original intentions of the developers, or may extend in totally unexpected directions.

Semantic reuse's definition is very similar to reuse in software engineering. It requires that the concepts (including relationships, axioms and rules), assumptions and expression(s) of the included content meet a need, and can be fit into the implementation of the re-user's development activities. Reuse seems to be done for similar reasons across all development-related disciplines - to reduce the development effort (by developing less), to expand the benefit (improve the return on investment) of the original content, and to improve the quality of the original content. Since increased use means that bugs are identified and eliminated, we have a virtuous cycle (especially when the different uses are diverse AND any bugs and changes are fully documented and explained).

What is the Range of Semantic Content Available for Reuse?

The range of semantic content on the Web and in Big Data is very broad. By "semantic content", we mean whole or partial ontologies, ontology modules, ontology design patterns and archetypes, and linked data schemas. All of these define semantics that could be reused to ease the burden of creating new content, enhance semantic (and not just syntactic) interoperability, and prevent data silos, thereby improving knowledge acquisition. (7)

What are starting points for Semantic Content Reuse?

Looking at extant, "big" ontologies can be helpful for starter ideas. But, quality, integrable ontologies and schemas are rarely built by trying to fully understand (or to create) large, top-down collections of content. Some traditional ontology engineering techniques, that start from heavyweight ontologies and methodologies, may not be easily applicable (or even appropriate) to use in a Big Data setting. It is important to consider at what point in the development lifecycle, reused content is introduced. Large ontologies and schemas can be distracting or confusing.

Finding and reusing the "useful pieces" of a comprehensive (foundational) ontology may be more costly than developing a scoped ontology or schema for particular purpose from scratch. As Kuhn says (in "Semantic engineering", in "Research Trends in Geographic Information Science"), For solving semantic problems, it may be more productive to agree on minimal requirements imposed on .. Notion(s)." However, when developing new content, it is valuable to be informed by existing work.

Reuse of existing work is complicated when the ontologies or schemas are spread all over the Web, or are incomplete or messy. Likewise, finding and re-using relevant parts of large ontologies and schemas can be challenging, especially if the parts need to be refactored or if instance data is being added continuously. The latter presents a "moving target", and does not leave sufficient time for review and update. A better practice is designing and reusing small, stable pieces such as Ontology Design Patterns (ODPs) and focused ontologies and schemas. Alignment of these with extant, quality ontologies and schemas can then proceed in an orderly fashion.

Lightweight Ontologies

Applications on the (big) Web of Data can profit from using lightweight ontologies and methods. These lightweight definitions can provide focused ontological commitment, and still afford the benefits of complete semantics and support reasoning. The idea is to find and use ontology parts that are appropriately expressive.

Competency Questions, Requirements and Opportunities for Semantic Content Reuse

There are many opportunities for reuse, but a domain and its competency questions must be understood first. Often, reuse fails because it is attempted before the requirements, underlying concepts and assumptions (driving the creation of the content) are fully understood. These help guide useful selection of what to reuse from the massive supply now available.

It should be noted that multiple formulations of the same content may be available (for example, several event ontologies were noted and examined by Katsumi and Gruninger in their Ontology Summit session on January 23rd, 2014). But, there remain issues regarding structuring and integrating the piece parts, and reconciling inconsistencies.

One challenge found by Katsumi and Gruninger is that axiomatizations of the "common-sense" ontologies (in this case, the event ontologies) are too weak to specify competency questions or to extract reasonable motivating scenarios.

Conditions for Reuse

Sam Adams used the term, "reuseful", in a panel discussion in 1993 ("Developing Software for Large Scale Reuse"). The term means that a piece of software (or in our case, semantic content) provides something "that is commonly needed". He goes further to state that to be reusable, the software or content must be found, understood and trusted. Discussions in the Track A sessions and in subsequent emails also highlighted the need for content to be documented (to make it understandable) and provided in a form conducive to reuse (or convertible to such a form). The discussions highlighted that documentation must include the basic details of the semantics, and also the range of conditions, contexts and intended purposes for which the content was developed. It was recommended that standard metadata for reuse be defined, and complete exemplars provided.

Specific items for making content reusable, both for capture and retrieval purposes, are that:

Content is accessible and can be found
The re-user is motivated to find the content
The content is in a form conducive to re-use or can be converted/transformed to a usable form
The re-user knows how to do the conversion/transformation
The content is logically consistent with the micro-theories of the re-user and this can be established
The re-user trusts the content and its quality, and believes that this quality will be maintained

Best Practice Ideas

Unfortunately, creating reusable content is difficult (even with exemplars as guidelines). There are approaches and best practices for modularization that should be documented and highlighted. It is important for developers to understand that modularity should be viewed from the perspective of the user, and not just knowledge engineers.

Various suggestions were raised in the Track A discussions such as (1) using SKOS concepts in metadata, (2) creating "integrating" modules that merge the semantics of the reused components, (3) distinguishing constraints that are definitive versus pragmatic, and (4) separating the definition of concepts from the descriptions of their usage in a particular domain, in reasoning applications or in data analytics. In addition, separately considering the reuse of classes/concepts, from properties, from individuals and from axioms was discussed as another best practice. By separating these semantics (whether for linked data or ontologies) and allowing their specific reuse, the goal is to make it easier to target specific content and reduce the amount of transformation and cleaning that is necessary.

Both metadata and documentation are needed to facilitate reuse. This starts with providing good descriptions and definitions of the concepts, relationships, axioms and rules that make up the content. To capture and understand the content, controlled natural language tools should be developed, to aid in the generation of candidate ontologies and vocabularies. This was recommended by John Sowa (in his paper on "Future Directions in Semantic Systems") with the goal of easing the knowledge acquisition bottleneck. In addition, the tooling should be fully integrated with standard IT tools.

Possibly, different encoding(s) of the content may exist. But, one needs to go beyond just this basic level of definition and documentation. It is important to describe why the ontology/schema was created, how it was tested, and how it may be used. If possible to use in different domains and for different purposes, then any differences should be discussed. A common finding is that ontologies which have been developed for a specific application or business context may have properties or classes which are framed more narrowly or more broadly than is appropriate for the re-using context. Perhaps, the capabilities and uses of the content even vary depending on the underlying tooling (which definitely should be explained). And, just as important as the current state of the content, the reuse information should include how the content and encodings have changed over time and across uses. Capturing this level of information was a goal of the VOCREF (Vocabulary and Ontology Characteristics Related to Evaluation of Fitness) Ontology, which was a Hackathon event in this year's Ontology Summit.

Analogous to software engineering, reused ontologies and schemas should be verified. For software, to ensure a copy and pasted (or invoked) algorithm does what you want, you run it against some input values and compare the output to what was expected. With data, you run queries in a structured environment (such as SQL or SPARQL). Ontologies and schema can work in the same ways. Tools exist to import data and verify that data and its backing ontology or schema against a set of queries. The main benefit of this is that it allows engineers to experiment before committing to a more in-depth analysis, verification or conversion. The gap exists in helping engineers identify and create the queries that ensure an ontology or schema is appropriate for reuse.

Tooling to Support Reuse

Once content is documented and available, reuse still requires that it be found, retrieved and maintained. Tooling is needed to allow search of content and its metadata, as well as user feedback and questions, usage history, and maintenance, evolution and governance data. Without tooling, there will be no widespread reuse - just reuse by dedicated individuals who proactively search for semantic content and then dig into it. Tooling is also needed to aid in the incorporation/integration of reused content. Broader use by mainstream efforts including Big Data is bottlenecked in part by the paucity of semantic tools integrated into mainstream tools, along with the inherent learning curve of understanding semantics.

Tooling is particularly sparse when it comes to modeling and understanding a complex set of ontologies and their inter-relations. A project which aims to re-use content from existing ontologies will tend to have just such a complex structure. Even tools aimed at model driven ontology development do not appear to cater for a broader, "system" level design of interrelated ontologies and their imports paths.

Ontology repositories can help with the task of finding relevant content. It was recommended that the Open Ontology Repository infrastructure be enhanced to provide the capabilities note above. But, we also must consider the linked data world. A resource there is the Linked Open Vocabulary (details can be found at LOV Cloud Lookup Service). LOV provides a service to find relevant vocabularies for reuse, based on declaring metadata and refining, extending or stating equivalences with other vocabularies. These tools need to be extended and integrated to enable broad reuse.

Recommendations

Wise reuse possibilities follow from knowing the project requirements. Competency Questions should be used to structure the requirements, as part of an agile approach. The questions help frame areas of potential content reuse.

Avoid slavish reuse of content early in the lifecycle. This can be confusing as a starting point. Instead, content reuse and alignment should be exploited after some initial semantics have been developed, and conceptual modeling of content has proceeded in terms understandable to domain experts and project clients.

Be tactical in formalization. Take what content you need and represent it in a way that directly serves your objective.

Small ontology design patterns provide more possibilities for reuse because they have low barriers for creation and potential applicability, and offer greater focus and cohesiveness. They are likely less dependent on the original context in which they were developed.

Use of "integrating" modules merge the semantics of reused, individual content and design patterns.

Tools to support reuse include ontology repositories with good search capabilities and governance information, which support data import and query processing for verification. These aspects aid in finding and validating relevant content.

Better metadata (providing definitions, history and any available mapping documentation) for ontologies and schemas is needed to facilitate reuse. Also, it is valuable to distinguish constraints or concepts that are definitive (mandatory to capture the semantics of the content) versus ones that are specific to a domain. Domain-specific usage, and "how-to" details for use in reasoning applications or data analytics are also valuable. Some work in this area, such as Linked Open Vocabulary and several efforts in the Summit's Hackathon, is underway and should be supported.

Separately consider the reuse of classes/concepts, from properties, from individuals and from axioms. By separating these semantics (whether for linked data or ontologies) and allowing their specific reuse, it is easier to target specific content and reduce the amount of transformation and cleaning that is necessary.

Better ontology and schema management is needed. Governance needs a process and that process needs to be enforced in tooling. The process should include open consideration, comment, revision and acceptance of revisions by a community.

RDF provides a basis for semantic extension (for example, by OWL and RIF). But, RDF triples without these extensions may be underspecified bits of knowledge. They can help with the vocabulary aspects of work, but formalization with languages like OWL can more formally define and constrain meaning. This allows intended queries to be answerable and supports reasoning.

Controlled natural language tools to generate candidate ontologies and vocabularies may ease the knowledge acquisition bottleneck. Work should be supported for such tool integration with standard IT tools.

References from the Track A Mission and Synthesis

Trends Issue 30 (2012): Special Issue on Big Data
Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE)
Basic Formal Ontology (BFO)
Semantic Web for Earth and Environmental Terminology (SWEET)
OpenOntologyRepository
Kuhn, W. (2009). Semantic engineering. In Research Trends in Geographic Information Science (pp. 63-76). Springer Berlin Heidelberg.
Future Directions in Semantic Systems (John Sowa)
Web site for controlled natural languages
Reasoning about Events on the Semantic Web (Megan Katsumi and Michael Gruninger)

-- Still under maintenance by the Track co-champions ... please do not edit

Ontolog Forum