Ontolog Discussion - Converting Ontologies - March 31, 2005

Topic: Issues Associated with Converting Ontologies Between Representations (especially via Protégé imports and exports)
- Session page: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2005_03_31
- Topic proposed by: Peter P. Yim (first proposed during the 2005.03.24 call)

Session Moderator: Kurt Conrad

Background:
- For our Cct Representation Project, we are supposed to
  - 1. map the ebXML Core Component Types (CCT's) to SUMO and/or its extensions to come up with a normative CCT-Ontology;
  - 2. import that SUO-KIF based CCT-ontology into Protege (making sure to capture the axioms, even though they might only be stored as text strings), and
  - 3. leverage Protege's plugin's (and/or augmented by other tools) to convert the CCT-Ontology into other popular representations. More specifically:
  - 4. With the KIF-based CCTONT as the normative ontology, start developing/translating/mapping it (in a "lossless" or "lossy" manner) to other languages and representations/languages/formats -- including, but not limited to (as resources and skillsets permit): OWL, XML/XSD, RDF/S, UML2/OCL, UMM/UML Class Diagram, SQL, ... and continuously improve on that.

Goal: we intend to assemble a panel of experts at this session to discuss:
- (A) How can we accomplish the above CCT-Ontology conversion task,
- (B) What are the issues associated with such an endeavor, and
- (C) What else is missing (additional resources - expertise, tools, ... - that we need) before such conversions can be readily done by everyday practitioners.

Resources:
- CCT to ontology mappings v0.4
- Protege version of the CCT-ontology v0.4 - ( download )
- From Pat Cassidy (ref: his 2005-03-28 post) - Isagelous Ontologies
- From Holger Knublauch - http://ontolog.cim3.net/forum//ontolog-forum/2004-10/msg00016.html
- http://protege.stanford.edu
- VNC service for this session: http://vnc2.cim3.net:5800/ -
  - view-only password="ontolog"
  - view-and-control password="convert"

For Virtual Session Tips and Ground Rules - see: VirtualSpeakerSessionTips

Meta-discussion:
- Can adopt a set of goals and an agenda for this session first, before starting the discussion? (-=ppy)
- Given that we have an invaluable pool of resources gathered here, how can we make the most of the next 1.5~2.0 hours? (-=ppy)
- May I suggest that we avoid the discussion on the utility, or the futility, of making lossy conversions with ontologies, and focus on how we could effectively do it, so that we bring ontological engineering 'closer' to today's practicing system architects, analysts and developers? (-=ppy)

Discussion:
- Isagelous Ontologies (ref.) - Pat Cassidy / 2005.03.24

Shared ontologies can be useful to allow development of knowledge bases by different teams of developers, while keeping the meanings of terms consistent or at least having well-defined relations to each other. There are, however, several notations in which ontologies are currently being developed, and different teams may prefer different notations. The most common formal ontology notations belong to either of two groups; one group being the first-order logic related notations such as KIF, CycL, Common Logic, or Conceptual graphs. The other group is the RDF/OWL notation and other description logic notations. Certain graphical knowledge representation notations such as UML or Topic Maps are not as expressive as the two main ontology representation groups.

OWL has become a standard for the Semantic Web community. Its internal reasoning capability is based on a restricted subset of first-order logic, but there is still work underway to supplement it with some form of axiom-defining capability, such as the proposed SWRL (Semantic Web Rule Language). Some ontology-development tools such as Protege also implement an internal logic that is less expressive than full first-order logic. To gain the anticipated benefits of interchange and reuse of ontologies among different development teams who may prefer different notations, both the more popular OWL standard and the more expressive first-order logic representations should be supported by knowledge maintenance tools. Any differences in the notations used to record the ontologies must be bridged by some format-conversion program.

Among those who have labored to develop ontologies and struggled with the different variants and versions of each notation, it has been aphoristic to assert that any translation of an ontology from a more expressive language such as KIF to a less expressive language such as OWL or Protege, will result in a loss of information, though the reverse transformation is not problematic. This observation is true only if one assumes that those who use an ontology in the less expressive notation will always restrict their reasoning procedures to those that are explicitly enabled by the built-in assumptions of the restrictive ontology language. An important distinction that needs to be made is between the static symbolic representation of knowledge, in any format (paper, CD, computer memory), and the programs or people that interpret the symbols and give them meaning. Symbols have no practical meaning without a process (mental or mechanical) that can interpret them. If we keep in mind the distinction between the static symbolic representation of knowledge and the processes that interpret it, we can visualize means to preserve information while moving it between knowledge representation notations of different expressiveness.

It is always possible to convert any quantity of information to some form of text and preserve and transfer it in that form to any other ontology notation. This holds because all ontology formats provide some mechanism for storing text strings, and each class of text strings can be given a specific intended meaning, e.g. the instances of a class of KIF axioms must be text strings conforming to the format of a KIF proposition, and the non-variable elements of each axiom must be legal classes, instances, and relations defined in the ontology itself. Thus the axioms, higher-arity relations, and functions used in KIF-like ontology formats can be converted into blocks of text, reified higher-arity relations, and relations that have functional arguments, all in standard OWL notation, and these structures can be used as axioms, functions, etc. by programs that can recognize them as such. All of the information in a first-order ontology notation can be preserved in OWL for use by any who do not want to be restricted to the inferences permitted by description logics.

Such translation of ontologies is possible, but needs a translation program. In order to function for interoperability, there also needs to be some agreement among a wide user community on how each of the more logically expressive (e.g. KIF) structures will be represented in a less expressive format such as OWL or Protege; the reverse transformation must also be agreed on. Given a program that can provide a bidirectional translation from less expressive to more expressive notation and vice-versa, users must still encode the problematic structures (axioms, higher-arity relations, and function terms) in format that can be translated by that program. This means that some standard of representation in the less expressive format must be agreed on for the problematic logical structures.

A simple example of a unidirectional translator from SKIF (a variant of KIF used in the SUMO ontology) to the Protege format has been built by the author (ref.), in which axioms are preserved as strings (with some syntactic checking in the translation process); higher-arity relations are represented both as ordinary slots for perspicuity in the Protege representation and as reified relations to capture all of the information; and functions are represented redundantly both as slots or higher-arity relation, and as classes of the entities returned as the values of the functions. The reverse translator from Protege to KIF was not constructed, but would require only the time to ensure that all details were accurately processed.

It will also be possible for a bidirectional translator from KIF to OWL (or OWL-Protege) to be built that would faithfully capture and translate all of the information in a KIF-format ontology and store it in an OWL-format ontology for users to use or ignore at their discretion. Such ontologies would be functionally isagelous, each containing all and only the (non-structural) information content of the other. The author's experience suggests that such a translator would take less than one person-year to construct, as a Protege-OWL add-on. It remains to be seen if there is a demand within the ontology user community for such a translator. With the widespread implementation of SWRL as an adjunct to OWL, a part of the specification for translating axioms will have already been adopted. The SWRL notation will work more directly within the OWL paradigm, and will reference the OWL reasoning procedures that will ensure consistent interpretation among a community of users of the ontology. With the use of SWRL, much of the missing functionality of OWL will be available, but there will still be some logical structures expressible in KIF that will not be easily expressed in SWRL. A KIF<=>OWL translator may still be useful.

The goal of enabling interoperability by use of a common ontology can be further advanced by not restricting the inferencing information in an ontology to simple first-order axioms, but allowing the inclusion of procedural code, as text strings or in some other manner, that would supplement the axioms where more complex reasoning is desired to specify the intended meanings of concepts. Axioms and procedural code may have attributes specifying under what circumstances or in which contexts such reasoning should be enabled. Procedural attachment is already permitted in UML-2 structures, has been suggested as a potential extension to SWRL rules, and is in the spirit of object-based programming languages such as Java. The goal of separating declarative information and logical rules from the procedural code of programs is one of the motivations behind the development of ontologies and of the SWRL rule language. This separation enhances interoperability, simplifies the addition of new knowledge to existing systems, and reduces the inherent complexity of knowledge-based systems. However, the inclusion of procedural code as an attachment to concepts would function only as a more complex form of rule attachment, and would not violate the principles of declarative knowledge representation. The virtue of considering such procedures as an integral part of an ontology would be to mitigate the potential for programmers in different development groups to reinterpret or misinterpret the meanings of concepts whose intended meanings are incompletely specified by the associated relations and axioms. Wherever it may hold that specifying the intended meanings of concepts is easier in a programming language than in logical axioms, it may serve to encourage the ontologist to provide the more complete meaning specification by allowing such procedural attachments, along with the conditions that would require their execution. The chance of variant interpretations by other users of such concepts will in that manner be reduced. As a long-term goal one may envision the situation where the only differences between interoperable systems will be in the user interfaces or data acquisition methods, and all reasoning with and conversion of data will be performed by the axiomatic or procedural specifications encoded in the common ontology used.

Isagelous: from the Greek iso = same + aggelia = information, message. Used to characterize two or more information-bearing objects that have the same information content. Examples are computer files that can be converted into each other by an algorithmic transformation (e.g. text and its zipped equivalent), nucleic acids having the same genetic code-bearing sequence, or the same ontology in different notations.

- Holger Knublauch (responding to PeterYim's question and comment from EvanWallace) -
  - From Holger Knublauch / 2004.10.22 - http://ontolog.cim3.net/forum//ontolog-forum/2004-10/msg00016.html

[Peter] Besides mapping into OWL (or round-tripping, assuming we can actually

make a lossless mapping), we also (see: http://ontolog.cim3.net/cgi-bin/wiki.pl?CctRepresentation#nidPO) want to map to other more popular representations like XSD, UML, ... etc. granted that some of those will be lossy translations. [Our original list included: OWL, XML/XSD, RDF/S, UML2/OCL, UMM/UML Class Diagram, SQL, ... ]

We obviously want to capitalize on the rich suite of Protege plugins to

ease our job (on the multiple translation exercises, after we have "imported" the KIF ontology into Protege and/or OWL). To that end, we hope to tap into the Protege project team's knowledge and connections, so that we can draw upon the expertise of people who have intimate knowledge working between Protege and each and every one of the representations that we are targeting.

Suggestions?

[Holger] I may not have fully understand the requirements, but I think you are

trying to find a maintainable, possibly round-trip mapping between OWL and KIF or SCL.

My naive suggestion would be to try to map as much as possible

directly into OWL + SWRL, and for the rest, map them into instances of a KIF/SCL metamodel. This metamodel would be a public OWL ontology containing classes such as Axiom etc. This should make round-tripping easy and transparent, and would not violate any OWL practices. The extra stuff could be made accessible into annotation properties.

[Evan] Now that is something I would really like to see! It would seem to me

though that a CL metamodel in OWL would have to be in OWL Full. Would Protege-owl support the Full expressivity needed for that? For all that, where would you store the SWRL in Protege as well?

[Holger] Protege supports all the necessary OWL Full constructs (including

metaclasses if desired). My guess is that any complete mapping from KIF-like languages to OWL will lead to OWL Full, due to the larger expressivity. However, what's the problem with that? Even if all resulting ontologies are OWL Full, reasoners can still operate on those parts that are meaningful in their (DL) context. Extra information such as KIF axioms would be ignored on the fly to the reasoner. I am against attempts to try to artificially stay inside OWL DL, only in order to satisfy reasoners that are too inflexible to handle OWL Full. In case you intend to map the non-OWL parts of KIF into extra (annotation) objects, then these annotation objects could easily be filtered out. Of course, parts of the orginal semantics would get lost, but that's natural. So clearly any attempts should be made to try to map into corresponding OWL/SWRL constructs that reasoners know the semantics of.

You could represent KIF like SWRL in terms of an OWL metamodel/ontology.

The SWRL ontology is an OWL file like anyone else, available from:

http://www.daml.org/2004/04/swrl/swrl.owl

This means, you can already instantiate SWRL rules with any generic

OWL editor such as Protege: Just create a new project and import the SWRL namespace. Then use the Individuals tab to create rules etc. A colleague of mine at SMI is working on an optimized Protege plugin for handling SWRL more conveniently.

So, in a nutshell, someone could define a similar kif.owl ontology,

or a uml.owl. These can be used to represent those extra information that don't have a direct equivalent in OWL. Then, annotation properties could be used to link these objects to their context hosts. And reasoner interfaces should simply prune ontologies at properties that are marked as annotation properties. The DIG interface implemented by Matt Horridge for Protege already does that, and I believe the Jena DIG does that as well. Other APIs unfortunately reject any attempts to send such an ontology to a reasoner, but that's in my opinion more a bug than a necessary feature.

More discussion on converting between specific representations:
- KIF-Protege-OWL
  - different flavors of OWL
- KIF-Protege-RDF/S
- KIF-Protege-XML/XSD
- KIF-Protege-UML Class Diagram
- KIF-Protege-UML2/OCL
- KIF-Protege-SQL
- KIF-Protege-OWL/SWRL
- other conversions that are relevant and desirable
- coverting via another lingua franca
  - CL or SCL?
- coverting with the aid of other tools

Opening round of comments:
- Mark Musen: let's address cognitive elementss, i.e. how people can use the various representaitons, as well as the language capabilities.
- Ray Fergerson: I'm here to answer questions on imports and exports with Protege
- Olivier Dameron: same as Ray's position
- James Douma: want to see if there is a canonical form that we can map into
- Peter P. Yim: I suggest that we avoid the discussion on the utility, or the futility, of making lossy conversions with ontologies, and focus on how we could effectively do it, so that we bring ontological engineering 'closer' to today's practicing system architects, analysts and developers?
- Adam Pease: agree with Mark & Peter
- Monica Martin: observing
- Kurt Conrad: will fight if anyone says "xml is only syntax"
- Pat Cassidy: agree with Mark. Important that for people to use it, an ontology has to be easy to work with. Disagrees with Peter and Adam in that a translation does not have to be lossy.
- Peter Denno: disagree with Pat and side with Peter P. Yim in that there will be cases of lossiness
- Chris Menzel: axiomatization has to be complete, and there has to be agreement. There will be lossiness even in very expressive language frameworks
- David Whitten: it is possible to convert back-and-forth but lots of precautions

Further discusson:
- Kurt Conrad: What is representation?
  - Adam Pease: 3 aspects involved - Representation, Operation & Semantics --> implementation, syntax & semantics
  - Pat Cassidy: there is no "meaning" unless there is "interpretation", with the system doing some "checking" -- after that is it is not an arbitrary character string any more.
  - Mark Musen: good reference - Alan Newell's paper from 25 years ago: The Knowledge Level - stating the thesis that "a representation has no semantics unless there is a process applied to it"
  - Mark Musen: another useful reference - paper on the AI Magazine, circa 1992 - Randy Davis et al., "What is a Knowledge Representation" - describing how people mean different things in knowledge representation ...
    - Peter P. Yim requests Mark to post links to these two papers
  - Peter P. Yim: could we not go down this path (arguing whether we are properly representing knowledge) and direct the discussion to actually "how we can do it without misleading people".
  - Chris Menzel: there is connection between languages and structure, independent of interpretation ... that's what semantics is ... in a logic paradigm, the process of interpreting in denied
  - Peter Denno: agree ... although we also make assumptions that we solicit certain behavior of the recipient, which is possibly a machine
  - Pat Cassidy: you have a set of symbols, and a set of interpretations ... something has to happen to create the connection
  - Pat Cassidy: as for "lossy" - if one moves all the axioms of KIF into protege, and if you move evrything back, that is not "lossy"
    - Kurt Conrad: something is lost is something that has been actionabale is no longer actionable after the conversion
  - Chris Menzel: a resource - Pat Hayes Common Logic project is specifically relating to FOL to OWL conversion - see: http://cl.tamu.edu
  - Monica Martin: in humans, we always apply context in any inference - "my core component may be your BIE"
    - similarly, at Rosettanet, no one can agree on a minimum baseline as to what is minimally sufficient to convey the semantics
  - Pat Cassidy: maybe "controlled English", providing the restrictions, may help

Moving ontologies into and out of Protege
- Ray Fergerson: we have run into similar situation between Protege and RDF
  - we provide a check box and allow the user to choose to be "lossy" (by throwing the information that cannot be represented), and leave everything there.
  - that is predicated on the captured information is not "edited" or changed by any subsequent processor, so that its integrity is prewserved.
- Ray Fergerson: the OWL plugin does not allow the capture of anything it cannot represent now. It is conceivable we could do the same (as what we did with RDF).
- Pat Cassidy: some sort of validation would probably be in order too
- Kurt Conrad: what things are being dropped?
  - Ray Fergerson: for example, RDF does not have a way to limit cardinality (everything is cardinality: multiple)
  - Ray Fergerson: another is one cannot change the 'range'
- Ray Fergerson: one could also select in a check box to select
- Ray Fergerson: XML schema export available now, but will be in the main system and beefed up by v3.1 release
- Ray Fergerson: UML support is broken now ... funding will help
- Ray Fergerson: no OCL support now
- Ray Fergerson: no ongoing effort for PAL
- Ray Fergerson: SWRL - its a language now, but some day there will probably be an engine for it
- Ray Fergerson: our intended application is Protege; some users have done work to go from Protege to relational databses, but it has not been released to the public ... again, funding will help get that into everyone's hands
- Ray Fergerson: two mechanisms - a Protege centric one, and XSLT

Peter P. Yim solicted all present to review and clean up their contributions, to ensure the integrity of the captured discussion.

(the Stanford contigent, and Chris Menzel dropped off at this point, due to prior commitments)

- Peter Denno: Pat brought up "validation", maybe we could talk about this a little more
  - Pat Cassidy: Ray's approach is good ... maybe a re-check to reassure integrity (after the round-trip) is useful too
- Adam Pease: we are not getting resolution on what is "lossy" and what is not
  - Pat Cassidy: neither does a text editor on which one writes KIF axioms
    - Adam Pease: it is not the issue of the program, but that of the semantics ... as you become compliant with, say, the OWL rule, you will have broken its integrity as a KIF representation. IT's not a quality, or risk, but its a issue of mathematics, it's either "can" or "can't"

Closing
- Peter P. Yim: I still look forward to getting the various exports done for our CCT-Rep exercise
  - we need to document the "lossiness" and turn that to our advantage
    - Monica Martin: agree that this could be an opportunity ... there are situation where people refuse to make standards computable (or automate)
- James Douma: what would be a good representation to use, then?
  - Adam Pease: a couple of options -- greatest common denominator in expressivity: KIF and Category Theory (no operational implementation so far)
  - Monica Martin: it would be useful to validate that (that KIF is a good candidate)
  - Peter Denno: we should also include Common Logic (CL)
    - Adam Pease: Yes. KIF is a CL compliant language.
- Kurt Conrad: analogy of the Jon Bosak XML (in the face of SGML) story

Session adjourned at 12:41pm PST

Audio Recording of this Technical Discussion Session

(Thanks to Kurt Conrad and Peter P. Yim for getting the session recorded.)

To download the audio recording of the presentation, click here
- the playback of the audio files require the proper setup, and an MP3 compatible player on your computer.
Conference Date and Time: Mar. 31, 2005 10:53~12:40 PM Pacific Std Time
Duration of Recording: 1 Hour 46.8 Minutes
Recording File Size: 37.6 MB (in mp3 format)
Telephone Playback Expiration Date: April 10, 2005 11:26 AM Pacific Std Time
- Prior to the above Expiration Date, one can call-in and hear the telephone playback of the session.
- Playback Dial-in Number: 1-805-620-4002 (Ventura, CA)
- Playback Access Code: 601062#

Ontolog Forum

Ontolog Discussion - Converting Ontologies - March 31, 2005

Audio Recording of this Technical Discussion Session