Ontolog Forum

Tracks Deliverables Activities Team email Archive ontolog wiki

Intrinsic Aspects of Ontology Evaluation

Co-champions: {{#arraymap:LeoObrst,SteveRay|,|x|x}}


Ontologies are built to solve problems, and ultimately an ontology's worth can be measured by the effectiveness with which it helps in solving a particular problem. Nevertheless, as a designed artifact, there are a number of intrinsic characteristics that can be measured for any ontology that give an indication of how "well-designed" it is. Examples include the proper use of various relations found within an ontology, proper separation of concepts and facts (sometimes referred to as classes vs. instance distinctions), proper handling of data type declarations, embedding of semantics in naming (sometimes called "optimistic naming"), inconsistent range or domain constraints, better class/subclass determination, the use of principles of ontological analysis, and many more. This Track aims to enumerate, characterize, and disseminate information on approaches, methodologies, and tools designed to identify such intrinsic characteristics, with the aim of raising the quality of ontologies in the future.

The scope of this track includes: Dimensions of evaluation, methods, criteria, properties to measure


OTra forma



It is useful to partition the ontology evaluation space into three regions:

  1. Evaluation that does not depend at all on knowledge of the domain being modeled, but does draw upon mathematical and logical properties such as graph-theoretic connectivity, logical consistency, model-theoretic interpretation issues, inter-modularity mappings and preservations, etc. Structural properties such as branching factor, density, counts of ontology constructs, averages, and the like are intrinsic. Some meta-properties such as transitivity, symmetry, reflexivity, and equivalence may also figure in intrinsic notions.
  2. Evaluation where some understanding of the domain is needed in order to, for example, determine that a particular modeling construct is in alignment with the reality it is supposed to model. It may be that some meta-properties such as rigidity, identity, unity, etc., suggested by metaphysics, philosophical ontology, and philosophy of language are used to gauge the quality of the subclass/isa taxonomic backbone of an ontology and other structural aspects of the ontology.
  3. Situations where the structure and design of the ontology is opaque to the tester, and the evaluation is determined by the correctness of answers to various interrogations of the model.

We have chosen to call Region 1 Intrinsic Evaluation and Region 3 Extrinsic Evaluation. The reason this sort of partitioning is helpful is that purely intrinsic evaluation is highly amenable to automation (which is not to say that the other partitions are not automatable eventually, with more effort) and thus to scaling to many ontologies of any size. Examples of such tools include the Oops! Evalution web site at and described by María Poveda Villalón {{{[ see slides ]}}}, and the use of OntoQA to develop metrics for any ontology based on structural properties and instance populations, described by Samir Tartir {{{[ see slides ]}}}. By the very nature of the Oops! web tool, it is not possible for it to depend upon any domain knowledge. Instead, it reports only on suspected improper uses of various OWL DL modeling practices.

Similarly, Region 3, purely extrinsic evaluation, implies no ability whatsoever to peer inside a model, and depends entirely on model behavior through interactions. In some cases, it may be appropriate that extrinsic evaluation criteria be considered as intrinsic criteria with additional, relational arguments, e.g., precision with respect to a specific domain and specific requirements.

For the purposes of developing reasonable expectations of different evaluation approaches, the challenge mainly lies in clarifying the preponderance of work that falls within Region 2, where some domain knowledge is employed and combined with the ability to explore the ontology being evaluated. For example, the OQuaRE framework described by Astrid Duque Ramos {{{[ see slides ]}}} falls in this middle region as it combines both context dependent and independent metrics. Indeed, the OQuaRE team has stated their desire to better distinguish between these two categories of metrics. Another example is the OntoClean methodology (not reported on in Ontology Summit 2013, but generally well-known [1, 2]), that draws upon meta-domain knowledge, i.e., standard evaluative criteria originating from the practices of ontological analysis.

Of course, structural integrity and consistency are only two kinds of evaluation to be performed, even in a domain-context-free setting. Entailments, model theories and subtheories, interpretability and reducibility are just a few of the other properties that should be examined. It is the goal of this summit to define a framework in which these examinations can take place, as part of a larger goal of defining the discipline of ontological engineering.

[1] N. Guarino, C. Welty. 2002. Evaluating Ontological Decisions with OntoClean. Communications of the ACM. 45(2):61-65. New York: ACM Press.

[2] Guarino, Nicola and Chris Welty. 2004. An Overview of OntoClean. In Steffen Staab and Rudi Studer, eds., The Handbook on Ontologies. Pp. 151-172. Berlin:Springer-Verlag.