Actions

Ontolog Forum

OntologySummit2008_Communique - Draft workspace

Champions: Leo Obrst & Mark Musen

Notes

  • some discussion on the flow of this year's communique:
    • develop a narrative how these all connect:    - Leo et al. at the 2008.03.24 summit-org call
  • Barry: Mark & Leo should communicate what they expect from the content champions (Mark: we'll email)   

From: Leo Obrst / Mark Musen - 06-Apr-2008 message

We would like to develop a narrative that shows how these threads all connect, i.e., what is the state-of-the-art in ontology repositories:

  • Which ontologies? Where are they? Where are mappings among ontologies? What is the state of the art?
  • What are the entry requirements for ontologies and other artifacts for such an OOR: criteria, validation, certification, etc.?
  • What are the methodologies for ontology development, ontology registration, repository services?
  • What are the requirements from content providers and from content users?
  • To seriously address the effort, we need to develop an ontology of ontologies: what is this?
  • Once we have requirements, what is the best architecture for an open ontology repository?
  • Finally, we need an OOR roadmap to realization that addresses the design, implementation, maintenance, and other issues to make this real.

Ontology Summit 2008 Communique: Towards an Open Ontology Repository

  • Goals of this Communique
  • Requirements for an Open Ontology Repository
  • State of the Art (Frank Olken)
  • Quality and Gatekeeping (Barry Smith and Fabian Neuhaus)
  • Ontology of Ontologies (Michael Gruninger and Pat Hayes)
  • Repository Architecture (Michelle Raymond and Ravi Sharma)
  • Conclusion: Toward the Future

We witness and agree to the contents of this Communique:

<Name, Affiliation>


Ontology Summit 2008 Communiqué: Towards an Open Ontology Repository (Draft v. 6)

by Leo Obrst & Mark Musen (ref.)

1. Introduction

Each annual Ontology Summit initiative intends to make a statement appropriate to each Summit's theme as part of the Ontolog Forum's general advocacy to bring ontology science and engineering into the mainstream. The theme this year is "Towards an Open Ontology Repository". This communiqué represents the joint position of those in the Ontolog Forum who were engaged in the year's summit discourse on an Open Ontology Repository (OOR). In this discussion, we have agreed that an ontology repository is a facility where ontologies and related information artifacts can be stored, retrieved and managed."

We believe in the premise and the promise of the Semantic Web, i.e., a Web of exposed data and the interpretation of that data, i.e., its semantics, using common standards, thereby enabling distinguishable, computable, reusable, and sharable meaning of Web and other artifacts: data, documents, and services. But we also believe that making that vision a reality requires additional supporting infrastructure. And we believe that infrastructure should be open, extensible, and provide common services.

The purpose of an Open Ontology Repository is to provide an architecture and an infrastructure that supports a) the creation, sharing, searching, and management of ontologies, and b) linkage to database and XML Schema structured data and documents. Complementary goals include fostering the ontology community, the identification and promotion of best practices, and the provision of services relevant to the ontologies and instance stores. Automated semantic interpretation of content expressed in knowledge representation languages, the creation and maintenance of mappings among disparate ontologies and content, and inference over this content are examples of anticipated services. Such repositories ultimately will support a broad range of semantic services and applications of interest to enterprises and communities.

Achieving these goals will help reduce semantic ambiguity whenever and wherever information is shared, thereby allowing information to be located, searched, categorized, and exchanged with a more precise expression of its content and meaning. The artifacts of the repository will provide a semantic grounding for diverse formats and domains, ranging from the conceptual domains and specific disciplines of communities to technical schema such as WSDL, UDDI, RSS, and XML schema, and of course expressed in standard ontology languages such as RDF, OWL, Common Logic, and others. Perhaps most importantly, the repository will enable wide-scale knowledge re-use and reduce the need to re-invent the wheel to define concepts and relationships that are already understood.

These goals cannot be achieved at once, and must track the evolution of best practices as well as technology itself. It is also good system development practice to bound complexity by defining a system in terms of a series of short-term, achievable objectives. For this reason, as for other such initiatives, it's envisioned that the Open Ontology Repository will be developed in a series of phases, proceeding from the simple to the complex, with achievable goals that capitalize on previous experience and the emergence of technology over time. It is important to note that for any given phase, planning and prototyping is always in progress for subsequent phases.

2. Requirements for an Open Ontology Repository

The Ontolog community in the past year determined that the primary technical areas that needed to be discussed and illuminated to make the vision of an Open Ontology Repository a reality were the following: 1) determining the current state of the art in ontology repositories, 2) determining quality and gatekeeping criteria for registering and then provisioning ontologies and their instances, 3) developing an ontology of ontologies that would act as structure and metadata for registering ontologies and supporting the common repository of their instances, data, and services, and 4) developing a sound architecture for the envisioned Open Ontology Repository. Elaborations of these four technical areas, together, helps provide both requirements and the ideas and tools that could realize those requirements. The remainder of this communiqué thus summarizes the results of the discussions in these areas.

3. State of the Art (Frank Olken) : http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2008_StateOfArt

Ontology repositories support the storage, search, retrieval and integration of multiple ontologies.

Ontology repositories support macro-level storage and retrieval (across the collection of ontologies) and micro-level operations (within individual ontologies). At each level we would like to support both text search, and semantic search (variously faceted search, SPARQL, ...)

A key decision is the choice of a representation of the ontologie(s). Current practice includes: text, frames (e.g., OBO), graphs (e.g., RDF), and various types of logic, e.g., description logics (e.g., OWL-DL) and full first order logic (e.g., Common Logic). Other possibilities include the use of UML (in Ontology Definition Metamodel).

Ontologies may characterized in graph-theoretic terms as either trees (e.g., simple taxonomies), multi-faceted (e.g., Colon classification system by Ranganathan, specification of concepts via the conjunction of several taxonomies), or directed acyclic graphs (e.g., partial orders, supporting multiple inheritance), or arbitrary directed graphs (possibly containing cycles), named graphs (names refer to subgraphs).

Graphs are often stored in long narrow relations, e.g., "triple stores" of RDF triples (subject, relationship, object). Current practice is to use "quad stores" in order to support Named Graphs. "Column stores" such as MonetDB and Vertica have also been used to store graphs.

For the purposes of ontology integration it helps to have all of the ontologies in the repository encoded in a common representation. However, this requires the sometimes difficult translation of ontologies among various representations into the common representation.

We also need some way to support ontology integration by specifying the mappings among entities, e.g., with relationships such as same_as, is_a, and part_of. If the ontologies are partial orders (taxonomies or partonomies) then these mappings should be order preserving.

Many systems which support partial orders (taxonomies and partonomies) may decide to materializations of the transitive closure of the partial order relation. This provides faster query evaluation at the expense of additional ingestion costs and maintenance.

In a distributed setting, one will want to implement an ontology repository using a Service Oriented Architecture (SOA), providing access, search, etc. capabilities via web services. Two major approaches to SOA are REST (Representational State Transfer) and SOAP. REST is built on HTTP, with a small set of operators (GET, PUT, POST, DELETE) and the use of URL (or URI) addresses for all objects of interest. SOAP is based on XML RPCs. REST is much simpler to implement and should be adequate for typical ontology repository functions. Both SOA approaches are currently being used.

Finally, an ontology repository will typically facilitate access to a variety of ontology related tools: ontology creation, editors, pretty printers, visualization tools (graph visualization), differencing tools, ontology modularization tools (clustering), ontology export, version management, access control.

4. Quality and Gatekeeping (Barry Smith and Fabian Neuhaus): http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2008_QualityAndGatekeeping

We distinguish between gatekeeping and quality control. Gatekeeping criteria are a set of minimal requirements that any ontology within the OOR has to meet. These criteria are intended to enable the users of the OOR to find ontologies that fit their needs quickly, they are not supposed to ensure the quality of the ontologies.

4.1 Gatekeeping Criteria

The ontologies in the OOR have to meet the following criteria:

  1. The ontology is open. (see below)
  2. The ontology is expressed in a formal language with a well-defined syntax.
  3. The authors of the ontology provide the required metadata.
  4. The ontology has a clearly specified and clearly delineated scope.
  5. Successive versions of an ontology are clearly identified.
  6. The ontology is adequately labeled.

So far the most controversial suggested criterion has been the "openness". We need to distinguish between different kinds of "openness", in particular between 'open' development processes and 'open' software licenses. Different members of the community have different preferences on which kinds of openness and how much openness should be required. Some would like to cancel 'openness' as a gateway criterion and rather require the developers of ontologies to provide metadata that allows potential users to understand how 'open' (and in which senses of the word) an ontology is. This issue needs to be addressed during the meeting in Gaithersburg.

4.2 Quality Control

The community agrees that it is not sufficient for the OOR just to store ontologies, but that it needs to provide the possibility to evaluate the ontologies within it. There is no agreement on how to evaluate ontologies; the main strategies suggested are: (i) A market driven approach where ontologies are reviewed by users and ranked like items on Amazon.com; and (ii) an editorial process where ontologies are reviewed by experts in a similar way as papers which are submitted to scientific journals. The difference in opinion about ontology evaluation reflects the fact that the members of the community are using ontologies for different purposes and thus have different perspectives on what ontologies are. However, there is agreement that the OOR should accept ontologies regardless of whether their developers see ontologies as pieces of software, as representations of scientific knowledge, or as standardized vocabularies. Accordingly, the OOR needs to enable the different styles of evaluation and different standards for ontologies. We suggest a distributed governance model where the OOR allows for subcommunities that provide stewardship for their respective fields by evaluating the available ontologies and by distinguishing high-quality ontologies according to appropriate standards.

5. Ontology of Ontologies (Michael Gruninger and Pat Hayes): http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2008_OntologyOfOntologies

The metadata for ontologies should support the sharing and reuse of ontologies within the repository.

The metadata should allow users to:

  1. retrieve ontologies for use in domain applications;
  2. retrieve ontologies to be integrated with other user ontologies;
  3. retrieve ontologies that will be extended to create new user ontologies;
  4. determine whether or not an ontology can be integrated with user ontologies;
  5. determine whether a set of ontologies retrieved from the repository can be used together;
  6. determine whether an ontology in the repository can be partially shared.

We can consider logical metadata (logical properties of the ontology independent of any implementation or engineering artifact) and engineering metadata (properties of the ontology as considered as an engineering artifact). The logical metadata include the following.

5.1 Logical Metadata

What language is used to specify the ontology? There is a range of languages. A formal language has a syntax (logical symbols together with a formally specified grammar) and a model theory (which specifies the conditions under which expressions in the language can be given particular truth assignments). The report "Evaluating Reasoning Systems" contains a classification of formal languages used to specify ontologies (see references on the thread page). A formalizable language has a syntax, although it does not have a model theory. Some examples include (with their languages parenthesized): topic maps (XML), folksonomies (XML), ISO 15926 (EXPRESS) Some ontologies are only specified in natural language or specialized syntactic formats: WordNet, most taxonomies, thesauri. Modularity is also important. Is a particular ontology a monolithic set of axioms, or is it composed of a set of smaller modules? Is each module considered to be a separate ontology within the repository? If not, what are the relationships between the modules? Which modules of an ontology can be used separately? For example, the Process Specification Language (PSL, http://www.mel.nist.gov/psl/psl-ontology/) consists of a set of modules which are extensions of a common core theory PSL-Core. Metadata for each module specifies which other modules must also be included when using the module.

The relationships among ontologies is also important. These include the notions of mutual consistency. For example within the Catalog of Temporal Theories [REF], a dense linear ordering is inconsistent with a discrete linear ordering. Another relationship is that of entailment: is one ontology stronger than another in the sense that any sentence in the first ontology entails the sentences in the second? This would be the case when one ontology can be considered to be a weaker version of another ontology within the repository. For example, in the Catalog of Temporal Theories [REF], the before relation is a partial ordering (i.e. it is a transitive antisymmetric reflexive relation). Since this ontology axiomatizes all of these properties, it entails an ontology that only axiomatizes the transitive property, such as OWL-Time. In other words, OWL-Time is weaker than the first-order theories in the catalog.

Another relationship is extension. An ontology T1 is an extension of another ontology T2 iff the set of sentences in T2 contain or entail the sentences in T1. T1 is a conservative extension of T2 whenever every sentence in the lexicon of T1 is provable from T1 iff it is provable from T2. T1 is a nonconservative extension of T2 whenever there is a sentence in the lexicon of T1 which is provable from T2 but not from T1.

Another relationship is definable interpretation. If the ontologies have different sets of primitives and relations, is it possible to define the primitives and relations of one ontology using the second ontology?

5.2 Engineering Metadata

Engineering metadata include: provenance, versioning, existing applications of the ontology (e.g. interoperability, search, decision support), and domain-specificity (e.g. biology, supply chain management, manufacturing).

5.3 Candidate Solutions and Recommendation

The Ontology Metadata Vocabulary (OMV) http://omv.ontoware.org/ is a strong candidate for describing ontologies in the OOR. In addition, we recommend collecting ontologies from Ontolog Summit participants, and testing out the different proposals for metadata on these ontologies. Developing use case scenarios will motivate the use of the metadata with these ontologies.

6 Repository Architecture (Michelle Raymond and Ravi Sharma) http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2008_RepositoryArchitecture

Over the past four months several dozen Ontology Summit 2008 and Open Ontology Repository Forum members have had the following categories and varieties of inputs and discussions on Repository Architecture (OOR-A) for the Summit.

  • Presentations, Panel discussions with experts on managing repositories
  • Architecture candidates, use cases, Requirements of Repositories to host ontologies
  • Definitions / Roles of Repository and Registry and integration
  • Discussion threads on Open, Distributed, Federated Repositories
  • Metadata requirements for Ontology Repositories
  • Engines, search and query functions
  • Preference for Content (data) out of repositories and inclusions of examples in repositories. Also whether some functioning ontologies will be resident in repositories
  • Functional and Physical characteristics of repositories
  • Non-functional requirements such as scalability, storage, security, federation, availability, and testbeds.
  • Also preliminary discussions included Governance, Standards, and Criteria for including different languages, types of ontologies, etc.
  • There were tremendous cross inputs from Ontology of Ontologies, Quality, State of Art Summary Workspaces and Threads, as well as from the various entities such as Content and Organizing Committees and other members.

The overall assessment of the community is to enable open, distributed, federated repositories, and to provide metadata for each type of ontology registered, as well as providing logical resources, inference engines etc., that are required to properly test the services and functions of ontologies served by the repositories. The general consensus was that the primary functional responsibility for an ontology lies with the originating ontology owners and their successors (downstream users) and that a repository cannot stand alone and thereby be responsible for the content that is generally stored outside the repository. Community work is expected to continue in the OOR-Forum and in other standards organizations (e.g. OMG-Ontology Definition Metamodel, XMDR, NCBO, NSF, W3C, OASIS, Industry and Others). There is potentially great value in such an open ontology repository, especially to the government (in critical areas such as Healthcare and bioinformatics and in acquisition and emergency response), as well as to industry, for example, by enabling participants to use rich semantic search/querying over repositories which connect multiple ontologies and instance bases.

7. Conclusion: Toward the Future

We look forward to establishing an open ontology repository in the future that adheres to the requirements put forth above. We endorse an open ontology repository that seeks to honor and implement the following overarching mission requirements:

  1. Establishing an Open Ontology Repository (OOR) Initiative that will promote the global use of ontologies, their instance bases, rules, and services, and mappings among these.
  2. Enabling and facilitating open, federated, collaborative ontology repositories.
  3. Establishing best practices for expressing interoperable ontology work in open registries/repositories.
  4. Enabling and facilitating the development of common services to support the repository and to extend the capabilities available to providers, users, and developers who use the repository.

We believe that creating this kind of infrastructure will facilitate the emerging Semantic Web. This Communiqué was reviewed, collaboratively edited, finalized and adopted by individuals present at the Ontology Summit 2008.

Endorsed by:

The above Communiqué has been endorsed by the individuals listed below. Please note that these people made their endorsements as individuals and not as representatives of the organizations they are affiliated with.

<Name, Affiliation>


Ontology Summit 2008 F2F breakout group draft-review-workspace:


-- This page is maintained by: Leo Obrst and Mark Musen