OntologySummit2012 Communique/Draft

Ontology Summit 2012: "Ontology for Big Systems" - Communiqué draft workspace

Please note that the Communique has been finalized and published - see: OntologySummit2012_Communique

(Details below are preserved for historical records only)

For the Ontology Summit 2012 Communiqué draft work ...

see proceedings of
- Session-12: "Organizing the 'Big' Communique" - ConferenceCall_2012_03_29
- "Pre-symposium preparation & Communique Review-II" session - ConferenceCall_2012_04_05
ref. our lead editors' message ...
- Please post change requests, with solutions, on the GoogleDoc draft. General comments can also be posted, but are lower in priority than change requests (i.e., the difference is a change request has a proposed solution). Note that only Ali and myself can perform edits of the text. Also, be aware that the Ontolog wiki page for the communique may not be in sync with the GoogleDoc draft version.
- The deadline for posting change requests and comments is midnight EDT on Tuesday 10 April. We then hope to have the final version posted no later than sunrise on Thursday 12 April 2012.

The following draft version was ...

Last updated: April 5, 2012 8:53am PDT by AliHashemi

Lead Editors: Ali Hashemi & Todd Schneider

Co-Editors: Mike Bennett, Mary Brady, Cory Casanave, Henson Graves, Nicola Guarino, Anatoly Levenchuk, Ernie Lucier, Leo Obrst, Steve Ray, Amanda Vizedom, Matthew West, Trish Whetzel

Description

This year's Ontology Summit, Ontology for Big Systems, sought to explore, identify and articulate how ontology, its methods and paradigms, can bring value to the disciplines required to engineer a "big system."

The term "big system" is intentionally vague and intended to cover a large scope that includes many of the terms encountered in the media including

big data
complex systems - techno-socio-economic
intelligent or smart systems
cloud computing,
net-centricity and
collective intelligence.

Established disciplines that fall within the summit scope include (but not limited to) systems engineering, software engineering, information systems modeling, and data mining.

Goals

The principal goal of the summit is to bring together and foster collaboration between the ontology community, systems community, and stakeholders in "big systems."

We will aim towards producing a series of recommendations describing how ontologies can create an impact; as well as providing illustrations where these techniques have been, or could be, applied in domains such as bioinformatics, electronic health records, intelligence, the smart electrical grid, manufacturing and supply chains, earth and environmental, e-science, cyberphysical systems and e-government.

Process

This is our 7th annual Ontology Summit season. Similar to our last 6 summits, this Ontology Summit was comprised of 3 months of virtual discourse (over our archived mailing lists) and virtual panel sessions (over augmented conference calls), during which the summit participants exchanged ideas on how ontological analysis and ontology engineering make a difference, when applied in or to "big systems."

The summit culminated in a 2-day face-to-face workshop/symposium held on 12-13 April 2012 at the U.S. National Institute of Standards and Technology (NIST) facility in Gaithersburg, Maryland, US.

As is traditional with the Ontology Summit series, the results will be captured in the form of a communiqué, with expanded supporting material provided on the web.

Summary

The common thread of this summit for big systems is models and modeling and the need to have models with greater fidelity and interoperability. The primary driver for a modeling approach to systems engineering and development is simply cost in time and money, and resultant system value.

Among the current approaches to mitigate some of the cost factors associated with engineering are executable architectures and model based engineering. Each approach involves a model to either understand the thing being designed or to provide a predictive base of design. In each case current methodologies and tools fail to provide

sufficient rigor in their ability to adequately represent the system for the needs of the entire engineering lifecycle and its environment,
explicit semantics, leaving those in the minds of the modelers,
the use of logical inferencing to automate processes.

The lack of adequate fidelity of models, their conceptualizations, and consistent semantics during engineering phases can incur poor design, mis-communication across the lifecycle and among stakeholders, implementation errors, re-work, and systems that fail to meet their expected uses nor cost-effectively be extended to meet unanticipated needs. During operation such systems may be difficult to maintain, including simple maintenance, updating, or even extensions. Moreover, there is a growing expectation for systems to be more 'intelligent'. To be able to adapt, or at a minimum be adaptable, to new needs without incurring large costs.

The world has entered a 'Cambrian' age of information explosion for the last decade but this is in transition to a new era of knowledge. The information age has resulted in the production of unprecedented amounts of data and information - Big Data. Accompanying this abundance of data and information are Big Systems that attempt to handle it and provide knowledge.

Finally, as we move into the knowledge age there is a growing expectation that our systems will be more self-describing and intelligent. In order to engineer such systems, allow intuitive use and meet expectations of all stakeholders, a more consistent and complete use of ontologies and ontological analysis must be made.

Introduction

Over the past 100 years, humans have entered the Cambrian age for information, knowledge and systems. The amount of information and knowledge produced, published and shared across the globe has been growing exponentially each year. In the past decade alone, more data has been collected, more video has been produced and more information has been published than in all of previous human history. At the same time, with the advent of the computer, digital representations, and the Internet, it has been possible to model more of the complexity of systems, connect more people and connect more (information) systems. With all this new information (aka Big Data) and all these new systems (aka Big Systems), there has also be an attendant growth in the complexity of systems that model physical phenomena and handle information, their size, their scale, their scope and their interdependence.

To address the problems that have arisen during the Cambrian period of information and knowledge, we need novel tools and approaches. Some of the major challenges facing Big Systems stem not only from their scale, but also their scope and complexity. At the same time, there are novel challenges for Big Systems when different, dispersed groups work together toward a common goal, for instance in understanding Climate Change. This leads to a need for better solutions for interoperability among federated systems and for fostering interdisciplinary collaboration.

Given the broad scope of this year's theme, Ontology for Big Systems, the summit was organized along three tracks and two cross-track initiatives. This communique seeks to distill and construct a whole from the activities that occurred within each track and throughout the summit. The interested reader is encouraged to visit the synthesis and community pages for further information. In addition each of the meeting pages, containing links to the presentations, audio recordings, and chat sessions is also available for review. The tracks were as follows:

Big Systems Engineering
Big Data Challenge
Large Scale Domain Applications
Quality Cross Track
Federation and Integration of Systems

Big Systems Engineering

Engineers and designers have always used a variety of models as part of their disciplines. Designing a car, a power plant, or a transportation system relies heavily on creating a model of the system. Similarly, models are used extensively in trying to understand how complex systems such as the body or climate works. These models express a theory of, or a part of, the world. In the computing age, it has become far easier to create and share these models.

Different fields deploy models of varying sophistication, though in many, the semantics - the meaning - of the model and its parts are implicit or governed by inconsistent convention. But the promise of model reuse, a desired goal, is hindered by differences in semantics. So a gradual shift to explicit semantics is underway, first in engineering and slowly in other fields.

The various disciplines within engineering are evolving from using informal modeling, to using formal languages to model their systems, to underpinning said languages with explicit semantics, to recognizing the importance of understanding the underlying ontology of the elements of the languages.

There are various standardization efforts underway to advance the semantic and ontological foundations, from the development of ISO 15926, to providing formal semantics for the Unified Modeling Language. Similarly, groups are working to build repositories of ontologies, or libraries of ontology patterns - snippets that formalize important aspects of reality such as "part-of" or "is-a".

Big Data and Applications

A key component of the current explosion of information is the proliferation of vast amounts of data. With greater computing power there is an easy ability to create and track data. Whether it be encoding an organisms DNA, tracking Internet usage, tracking credit usage, the experiments at the Large Hadron Collider or weather satellite data, each of these activities creates a staggering amount of data.

While the sheer size and scale of these data sets presents its own challenge, knowing how to first understand the data, garnering information and knowledge from it, and then intelligently combine it with other data sets means that there is a need to accurately represent (the portion of) the world this data represents. This in turn necessitates each data source adequately represents itself and what was intended by the publication of the data.

To do this, we need theory. There are limits to statistical analysis. We need both theory and statistical analysis together. Data creators and publishers need to make explicit what their data represents together with the context of the data and its creation (e.g., the systems that created and transformed it) to be able to intelligently use the data and combine it for other useful ends. This requirement necessitates developing theories about those parts of the world relevant to the data and its context. Without such theory and subsequent practice, successful data reuse and adaptability will not be possible.

There are a variety of groups working towards this vision. For example, the Linked Open Data (LOD) intitiative seeks to connect distributed data across the net. While there are many data sources available online today, that data is not readily accessible. The LOD cloud aims to create the requisite infrastructure to enable people to seamlessly build "mash-ups" by coming data from multiple sources.

Similarly, there has been a surge of work in bioinformatics, including the Open Biological and Biomedical Ontology, Gene Ontology and other sources which annotate big data with explicit semantics. These initiatives allow research groups to publish findings on genes, gene expression, proteins and so in a standardized consistent manner.

Another example is the FuturICT project funded by the European Union. Its ultimate goal is to understand and manage complex, global, socially interactive systems, with a focus on sustainability and resilience. FuturICT will build a Living Earth Platform, a simulation, visualization and participation platform to support decision-making of policy-makers, business people and citizens.

Interoperability

The Internet means that it is far easier for different people in the different parts of the world to share and combine data, information and knowledge. If we want to realize the true potential of this interconnected world it means that we need to be able to combine not just our data, but also our models.

An initiative like Sage Bionetworks might allow a doctor in China to integrate diverse molecular mega-datasets, and reuse a predictive bionetworks built by a team in United States that deploys new insights into human disease biology by a team in France. Each community (of practice) views and prioritizes parts of the world according to their own viewpoints and interests.

Similarly, within a single enterprise, the same product may be viewed differently by each of the marketing, engineering, manufacturing, sales and accounting departments. Making sure that these views are, if not harmonized, then aligned so that information can be successfully shared entails solving interoperability.

Semantic analysis is a fundamental, essential aspect of federation and integration. Building value by combining the views of different communities means solving interoperability, and that means negotiating the implicit meaning used by each of these groups.

The Object Modeling Group has recently put out a request for proposal to create a standard to address such issues. Similarly, within the systems engineering community, one example is the ISO 15926 standard which aims to federate CAD/CAM/PLM systems in industry, business and eco-system-wide scales.

Interdisciplinary Collaboration

Similarly, as knowledge has become more specialized, different communities have developed their own bodies of knowledge. Bridging these gaps can unleash a lot of potential, foster innovation, reduce the reinvention of the wheel and accelerate the development of better tools.

While each specialization may use its own jargon and technical language, the underlying reality is the same. Ontologies, in the form of explicit statement of the assumptions in each sub-field can help identify points of overlap and interest between different communities. They can serve as tools to facilitate search and discovery.

The Linked Science effort is a project that aims to create an "executable paper." It hopes to combine publication of scientiﬁc data, metadata, results, and provenance information using Linked Data principles, alongside open source and web-based environments for executing, validating and exploring research, using Cloud Computing for eﬃcient and distributed computing and deploying Creative Commons for its legal infrastructure.

Another project, the iPlant Collaborative, is building the requisite cyberinfrastructure to help cross-disciplinary, community-driven groups publish and share information, build models and aid in search. The vision is to develop a cyberinfrastructure that is accessible to all levels of expertise, ranging from students to traditional biology researchers and computational biology experts.

State of the Practice vs State of the Art

Most aspects of engineering involve models, many times residing solely in the engineer's mind. In the process of engineering big systems there are many models, and possibly quite complex models, developed by different disciplines, different teams and different people which may be geographically and culturally dispersed. But models from different disciplines have different levels of expressivity or fidelity, different degrees of automation, and are not interoperable in general. Aside from differences in tools and modeling syntax, more fundamentally, different and not necessarily compatible conceptualizations and interpretations usually arise. At various points in the system's development and operational lifecycle(s) these differences must be resolved and models integrated, or at a minimum, differences bridged, to achieve interoperability, including syntactic, conceptual, and semantic, in order for collaboration and continued development to occur. These efforts to resolve incompatibilities add additional time and costs.

To mediate at least the possible semantic differences among models there has been a progression in engineering to shift from informal modeling toward more explicit semantics, for instance chalk/white board sketches or textual descriptions, to modeling in formal languages that support more explicit and complete semantics. However, beyond the issues of semantic differences of models, there can be, and are, differences in conceptualizations. These differences may not always be readily apparent and sometimes manifest in modeling languages.

The modeling of big or complex systems (note that "system" can also refer to big data sets) requires conceptualizations within multiple domains of relevance to the system(s), their use(s), and engineering processes. Ontologies represent conceptualizations of aspects of a domain or environment. While ontological analysis provides a more thorough analysis methodology for understanding and distinguishing the complexity of big systems. Modeling, in all its various guises, is an area where ontology and ontological analysis is starting to be used and has great potential.

Ontologies can be viewed as patterns for what constitutes a system with parts and connections, the identity, dependence, unity of systems irrespective of their particular nature. Informally a system is an entity that consists of components, where the components are connected in some way such that the system as a whole exhibits some behavior. For engineered systems, it is usual for them to be designed such that the components are replaceable. Key relations like classification, specialization, and whole-part are well understood in the realm of ontology, and see major application in systems engineering. Computer based modeling languages provide some built-in support for component modeling and provide facilities for extending the language's ontological commitment, but are usually not sufficient to support formal semantics, logical inferencing, nor expressive enough to take advantage of rigorous ontological analysis.

For the class of big data, its attendant challenges include those similar to engineering in general, differences in conceptualizations and differences in semantics in addition there are differences in terminologies: different data/information models. Aside from these challenges, most big data resides in systems that were engineered for specific purposes and were embedded with implicit assumptions and semantics. In order to make the most effective use and reuse of such data/information the conceptualizations and semantics must be made explicit and (machine) accessible. Doing so will allow for the automation of the discovery of new relationships and knowledge.

The class of big systems derives from the federation of systems. There are multiple dimensions to the federation of systems - hardware, software, organizations and people. Challenges in the application of ontology for these systems includes modeling which requires a broad collection of conceptualizations and semantics spanning the federates, the ability to reuse data and artifacts from one life-cycle stage in later life-cycle stages, and integrating the models and their artifacts using multiple modeling languages. Thus if different systems include their own ontologies, they too must be federated - federated ontologies for federated systems. This is thus a requirement for ontologies. That they can behave as components. That they can be assembled to contribute to an ontology of the whole. Yet in general ontology developments are one-offs with it being rare for ontologies to be reused or be reusable. For ontology to be useful for engineering ontologies must be developed so that they can be reusable.

With the advent of standards for the semantic web and supporting tools it has become easy for most people to create ontologies. Unfortunately most people have not had training or even exposure to ontological analysis, the result being that myriad incompatible ontologies are being developed in ad hoc ways leading to the creation of new semantic silos.

Many sources of ontologies that abound today come from software development. Here ontologies are viewed as more sophisticated data models (which of course they are, but not just that). And, as with the early age of software development, there are few, if any, engineering practices and processes for ontology development or ontological engineering.

A hallmark of systems engineering, distinguishing it from less rigorous systems creation activities and essential to success in developing large-scale and complex systems and managing them throughout their life-cycles, is the rigorous use of requirement specifications, requirements-centric architecture and design, multi-stage testing and revision, and other risk-management and quality assurance techniques. Quality at any of these levels is defined in terms of the degree to which any one of the system, component, process, etc., meets the specified requirements. Analysis and specification of requirements and functions at each of these levels, along with identification and application of relevant quality measures, is an essential part of good systems engineering. In order for the potential of ontology to be realized in engineering, and especially for its application to complex or big systems and big data, these same practices need to be applied.

Recommendations

This section represents a distillation of the discussion in this year's summit focused on recommendations.

Ontologies and Engineering

The intersection between ontology and big systems and big data spans many communities, disciplines, and levels of depth. Regardless of the community, the success of any ontology intervention requires understanding its intended environment and problem space to be addressed. Clarifying how ontology fits into the larger picture will shape what level of expressiveness and semantics are required and how they may be employed in a project. Not all ontologies need to be reasoned over and rarely are they the end product.

In considering the use of ontology one has to gauge the level of "semantic maturity" of the organization and environment in which the use is proposed. To what degree does the broader organization understand ontology or the application of ontology? To what extent are such technologies already being deployed? Will the shift be incremental or might it be perceived as disruptive? Often, existing infrastructure will support traditional software development far better than large-scale ontologies, developing a migration path that deliver small wins while transitioning towards a more suited infrastructure makes such change easier to manage. Given that no single technology or tool currently provides the best solution across all large system use cases, most implementations should expect to evolve as the technology landscape changes.

From the Systems Engineering community, there was a strong emphasis on the importance of modeling, and explicating the underlying concepts and their semantics. A number of candidate modeling languages were considered alongside their deficiencies in semantic and conceptual clarity (ontology representation languages among those). It was further noted that developing an ontology of a problem space or domain as a referent conceptual model allows an organization to decouple this knowledge from any particular information model or technology implementation. In this way, a technology agnosticism is enabled, allowing the conceptual model to be reused and realized in whichever technology stack is most appropriate. There are many engineering and enterprise tasks where ontology is definitely applicable and would provide great value, but not yet in wide use.

Ontologies and Their Infrastructure

Systems engineering is all about understanding the whole and the relationships between the parts. It involves assembly from components and support for the use of the same parts in different systems. This calls for ontologies which can themselves be components of other ontologies and be assembled for an ontology of the whole system. Yet in general ontology developments are one-off with it being rare for ontologies to be reused or be reusable. For ontology to be useful for engineering reusable ontologies to support reusable engineering models will be important.

Big systems have a long life and usually change over that life. They tend to interact with their environment and change state as a result of interaction. This means conceptualizations are needed to model state change and system evolution throughout its lifecycle which in turn means that the ontologies that describe a system need to be able to change, but in a way that means that the history of changes is not lost. This requires a sophisticated approach to change and configuration management, both in model and ontology creation and maintenance.

When deciding what ontologies to use or implement, there is a consensus that where possible, ontologies should be reused from pre-existing sources. Two such sources were explored, Ontology Repositories with full-on ontologies , or libraries of ontology patterns that represent successful representations of particular relations or snippets of a domain. The former have the advantage of providing a more comprehensive solution, while the latter afford greater flexibility and in theory, allow the designer to pick and choose among a variety of patterns to best meet their needs.

Foundational ontologies contain conceptualizations needed for modeling, especially at the enterprise scale. These include processes, events, descriptions, plans, physical quantities, individuals, types etc. Further ontologies provide relationships between the concepts which can be exploited to relate data needed to determine program status. Some enterprises have recognized that ontologies generalize information models and provide better access and organization than traditional data models.

Ontology and Engineering Practice

Determining exactly which ontology is appropriate for an application is an involved task, and requires a number of judgments in terms of the desired expressivity, comprehensiveness and breadth. To this end, it was recognized that a number of distinct problems are often conflated. It is wise to disentangle:

1. The level of expressiveness (representation) needed for your domain. This is development time expressiveness. 2. The level of expressiveness (representation) it takes to efficiently reason over the ontology at run-time. This is run time expressiveness 3. Transformation of the representation of (1) to (2), i.e., knowledge compilation.

Not enough expressivity may mean that it is not possible or cumbersome to represent an essential aspect of your problem space. Conversely, allowing extraneous expressivity for reasoning can severely affect run time perfomance. A vital task for any ontology implementation is to understand the level of expressivity as required by the problem space, while also accounting for performance criteria. Moreover, reasoner and query engine performance are highly dependent upon the exact formulation of the inputted rules and queries. Alternative representations of the same axioms can have significant effects in reasoning time. One observation was that ontologies work best when not compromised by implementation tradeoffs.

This means that greater work is required to build adequate support frameworks for such tasks, which is currently minimal. When it also comes to the deployment or construction of an ontology, while the target community should be included in the development and evolution of the vocabularies, engineers turned ontologists often don't have the necessary background or skills. That said, it is critical to maintain a strong relationship with the domain experts about the fidelity of the model.

The transition from implicit domain knowledge to explicit encoding require community consensus, which in turn requires an organizational commitment to create the necessary infrastructure to manage such consensus. At the same time, consensus is not always possible, as different subgroups working on different parts of the same system may have differing views. In these cases, having explicit vocabularies (classifiers) is a must in a distributed system.

In those applications where the ontology will impact end users, there is broad consensus that the nitty gritty details of the ontology be hidden. At most, end users should be exposed to a something like simple knowledge organization system (SKOS) while the more complicated constructs be deployed only on the back end . For example, in one successful project, ontologies were used as configuration templates which user interface specialists then used to tailor views for their end users.

What makes a Quality Ontology?

It's also been observed that the proliferation of ontologies has not been accompanied by adequate tools or methodologies to gauge the quality of the ontologies. Quality dimensions/criteria/attributes and measures vary with the specific project at hand. We currently do not have a clear understanding and virtually no documentation as to how that variation works. Experienced ontologists develop a sense of this, but it is implicit and not made accessible to others.

Are they fit for purpose? Any ontology project should not only pay attention to quality, but develop a quality policy. How would the organization measure the success of the ontology project? While there currently exists no standard methodology, there are some efforts within the literature. Consequently, a more systematic effort is needed. Concurrently, it is important to spread the understanding that ontologies need to be viewed as technical artifacts that need requirements and quality assurance.

Interdisciplinary Exchange

A primary step in any interdisciplinary engagement is to identify the relevant overlap between the two fields. Effectively determining which problems and parts of the discipline would benefit from ontological analysis involves understanding what information is not available to colleagues in other disciplines. Developing appropriate mechanisms and systems to aid in this task would be greatly valued.

For example, in the course of this summit, a recommended reading list was compiled by practitioners from different communities to aid "outsiders" to more quickly hone in on the relevant literature. Participants from diverse communities sought to develop a baseline commonality in discourse to help focus discussions. Throughout such a process, it is essential to document and disseminate techniques successfully employed by ontologists - perhaps as part of repositories, pattern or use case libraries. Using well defined and narrow use cases and concrete examples helped tremendously in bridging the differing vocabularies of multiple communities. Similarly, such uses case can be effective in demonstrating the benefit of semantic approaches.

For Reference Only (Remove)

In general,

Expose users to SKOS semantics; use more complicated constructs only on back end if necessary.
Look for the 80-20 rule of semantic development
Use well defined and narrow use cases to demonstrate benefits of semantic approaches
Having explicit vocabularies (classifiers) is a must in a distributed system;
The community should be included in the development and evolution of vocabularies
It is critical to capture and evolve domain knowledge in a form that the community is comfortable with
Transition from implicit domain knowledge to explicit encoding requires community consensus - and an organization to manage the consensus

Other Observations / Lessons learned

UML to OWL is a common requirement for legacy systems
Starting from scratch is rare.
Ontology patterns are very helpful, and encourage model reuse
Semantic techniques work best when not compromised by implementation tradeoffs
Semantic methods are faster to implement and easier to maintain )
Semantic approaches particularly suited to systems with many complex constraints, rules, laws, with frequent changes
Incremental implementation is possible through federation of datastores )
Ontologies are not always applied to enable reasoners - sometimes just as a more rigorous data modeling approach
Engineers turned ontologists often don't have the necessary background/skills
Existing infrastructure supports traditional software development far better than large-scale ontology development
There are many ontologies of dubious quality
Service-oriented architectures allow separation of code and ontology updates
Reasoner and query engine performance is highly dependent upon the exact formulation of rules and queries
No single technology/tool currently provides the best solution across all large system use cases

Conclusion

Engineering, in particular systems engineering, can garner benefits in many ways from the use of ontology. To more completely insinuate ontology and ontological analysis into the engineering community and its processes, the skills most needed include a combined understanding of a scientific or engineering discipline and knowledge of ontological analysis and ontology-based technologies. To realize this combination a transition based on existing paradigms and tools will need to be exploited in order to create the infrastructure needed for quality ontology development and more general use.

In particular, the efforts by the Object Management Group (OMG) to provide a formal semantic underpinning to their Unified Model Languages and it derivatives (e.g., SysML) provide a step to meet the goal. Moreover, organizations such as the International Council on Systems Engineering (INCOSE) are already engaged in helping foster the inculcation and growth in the use of ontological analysis and ontology in their community.

Pragmatically "big systems" and "big data", especially from a cost perspective, have little technological recourse but to exploit the benefits to be gained from the use of ontology and ontological analysis.

shortened url for the OntologySummit2012 Communique:
- http://bit.ly/OntologySummit2012_Communique
- or, http://goo.gl/rxBnp .. .. <http://goo.gl/rxBnp.qr>

Ontolog Forum

Contents