Ontology Summit 2012: Session-10, Thursday 2012-03-15     (1)

Summit Theme: OntologySummit2012: "Ontology for Big Systems"     (1A)

Track 3 Title: Challenge: Ontology and Big Data     (1B)

Session Topic: Big Data Developing Challenges     (1C)

Session Chairs: Ms. MaryBrady (NIST) and Mr. ErnieLucier (NCO/NITRD) - intro-slides     (1D)

Panelists:     (1E)

  • Professor TimFinin (UMBC) - "Making the Semantic Web Easier to Use" - slides     (1F)
  • Dr. KyoungsookKim (NICT, JP) - "Use cases of cyber-physical data cloud computing" - slides     (1G)
  • Dr. MikeFolk (HDF Group) - "The HDF5 technology suite" - slides     (1H)
  • Dr. MarioPaolucci (LABSS/ISTC/CNR, Rome, Italy) - "FuturICT: Global Participatory Computing for Our Complex World" - slides     (1I)
  • Dr. UrsulaKattner (NIST) - "Data Needs for the Materials Genome Initiative (MGI) at NIST" - slides     (1J)
  • Dr. EdinMuharemagic (HPCC Systems; LexisNexis) - "HPCC Systems Machine Learning" - slides     (1K)

Attendees     (1M)


Session Topic: Meeting Big Data Challenges through Ontology - III     (1N1)

This is our 7th Ontology Summit, a joint initiative by NIST, Ontolog, NCOR, NCBO, IAOA & NCO_NITRD with the support of our co-sponsors. The theme adopted for this Ontology Summit is "Ontology for Big Systems." The event today is our 10th virtual session.     (1N2)

The principal goal of the summit is to bring together and foster collaboration among the ontology community, systems community, and stakeholders of some of "big systems." Together, the summit participants will exchange ideas on how ontological analysis and ontology engineering might make a difference, when applied in these "big systems.�� We will aim towards producing a series of recommendations describing how ontologies can create an impact; as well as providing illustrations where these techniques have been, or could be, applied in domains such as bioinformatics, electronic health records, intelligence, the smart electrical grid, manufacturing and supply chains, earth and environmental, e-science, cyberphysical systems and e-government. As is traditional with the Ontology Summit series, the results will be captured in the form of a communiqu��, with expanded supporting material provided on the web.     (1N3)

The goal of "Meeting Big Data Challenges through Ontology" Track 3 is to identify issues that can be addressed using an ontology challenge. Challenges can take many forms and target many issues.     (1N4)

Potential issues to be addressed by challenges:     (1N5)

  • Enhance collaboration and accelerate agencies�� adoption     (1N6)
  • Accelerate the adoption of ontological methods, maximize public awareness, and impact of research.     (1N7)
  • Increase the number of agencies using ontologies, i.e., earlier adoption     (1N8)
  • Where should our focus be to accelerate agencies�� adoption of ontology capabilities?     (1N9)
  • How many scientists, physicists, engineers, programmers, big data administrators, etc. have experience with ontologies?     (1N10)
  • Is the growth of ontological implementations and technologies with Big Data constrained by the shortage of qualified personnel?     (1N11)
  • Inform, educate, and include the public in scientific research and discovery. Public involvement could be a critical component of our success     (1N12)
  • A mismatch between those with data and those with the skills to analyze the data     (1N13)
  • Are programmers able to optimize the use of unstructured or semi-structured data sets for scientists and engineers?     (1N14)
  • What are the talent and skill set issues impacting the use of ontologies?     (1N15)
  • The skills important to the growth of ontological technologies with Big Data include a combined understanding of a scientific or engineering discipline and knowledge of ontology-based technologies.     (1N16)
  • Programmers are not able to optimize the use of unstructured data for scientists and engineers     (1N17)
  • Scientists and engineers without ontology training may use brute force programming �� this can be inefficient and the scientists and engineers without training may not be aware of options and capabilities using ontology-based technologies     (1N18)
  • Strategic significance to the economy, e.g. enabling competitive products.     (1N19)
  • How long does it take to become productive in the ontology environment?     (1N20)
  • Can universities expand coursework in ontologies and integrate ontological methods into the requirements for science degrees? At the undergraduate level? At the graduate level?     (1N21)
  • Identify individuals who have both domain experience and an understanding of what it means to apply ontology technologies.     (1N22)
  • Increase the number of individuals capable of applying ontology technology     (1N23)
  • Ontology-based technology evolution for big data may be slow or non-existent     (1N24)
  • Advances in the use of ontology technology can be difficult or unattainable without an adequate number of properly trained personnel, including scientists, engineers, programmers, system administrators, technologists, and all others that make up the big data systems.     (1N25)
  • Expanding the markets for ontologies could make the field a more attractive career path. What is the growth rate of the ontology market? Further expansion could spark investment and make ontologies an even more vibrant, attractive market for young people to enter.     (1N26)
  • A Challenge may seed and transform the current status quo     (1N27)
  • Software dilemma analogy with ontology     (1N28)
  • Ontologies and software perceived to be a commodity resulting in little or no investment in research. Projects use ontologies as one of their tasks     (1N29)

Potential challenge directions     (1N30)

  • 1. Increase the awareness of ontology technology among programmers/database managers     (1N31)
  • 2. Accelerate agencies�� adoption of ontology capabilities     (1N32)
  • 3. Enable scientists and engineers to make maximum use of big data     (1N33)
  • 4. Enable scientists and engineers to understand the potential of ontology-based systems integration     (1N34)
  • 5. Enable ontologists to understand scientists and engineers needs     (1N35)
  • 6. Ameliorate any mismatch between those with data and those with the skills to analyze it     (1N36)
  • 7. People in the domains of science, engineering, software, computer science, etc. can benefit from a combined knowledge of their domain and application of ontology-based technologies. A combined understanding of these domains and ontology-based technologies may encourage the growth of technology.     (1N37)
  • 8. Improve critical areas of current practice     (1N38)

This first session of Track 3 - ConferenceCall_2012_02_09 - was designed to help everyone understand the relationships between big data challenges and ontologies. At this second session today - ConferenceCall_2012_03_15 - we hope to talk about solutions and benefits. At the OntologySummit2012_Symposium, we would like to present various approaches to implementing ontologies using challenges and a sample from the NITRD Big Data working group.     (1N39)

More details about this Summit at: OntologySummit2012 (home page for the summit)     (1N40)

Agenda     (1O)

Ontology Summit 2012 - Panel Session-10     (1O1)

  • Session Format: this is a virtual session conducted over an augmented conference call     (1O2)

Tim Finin �� "Making the Semantic Web Easier to Use"

Abstract: The Semantic Web can help scientists and engineers in their big data activities by providing a Web-based data representation that ties data to semantics models, facilitates data sharing and linking, supports provenance annotations, and can exploit a large and growing collection of background knowledge on the Web. While the concepts and technologies are mature and supported by sound standards, their use within most application communities remains relatively low. This presentation will touch on some current research aimed at reducing the barriers to wider adoption and use. It will describe techniques enabling end users to generate Semantic Web data using familiar software tools, and new approaches to querying large collections of Semantic Web data.

    • Kyoungsook Kim �� "Use cases of cyber-physical data cloud computing (Tsunami application)"     (1O6A)
      • Abstract: I will introduce a new application system based on cyber-physical data cloud for fusion and analysis of physical and social sensing data to facilitate real-world aware information services.     (1O6A1)
    • Mike Folk - "The HDF5 technology suite"     (1O6B)
      • Abstract: HDF5 is a suite of technologies used widely in science and engineering for working with high volume, complex data. As such, HDF5 needs to support metadata as well as data. We will talk about how HDF5 can do this, and give an example.     (1O6B1)
    • Mario Paolucci - "FuturICT: Global Participatory Computing for Our Complex World"     (1O6C)
    • Ursula Kattner - "Data Needs for the Materials Genome Initiative (MGI) at NIST"     (1O6D)
      • Abstract: The Materials Genome Initiative (MGI) seeks to enable U.S. companies to discover, develop, manufacture, and deploy advanced materials faster and more cost efficiently. A highly capable Materials Innovation Infrastructure (MII) will enable integrated computational materials engineering by providing a powerful, widely accessible toolkit with computational methods and management and dissemination of digital data.     (1O6D1)
    • Edin Muharemagic �� "HPCC Systems Machine Learning"     (1O6E)
  • 3. Q & A and open discussion [All: ~30 min.] -- please refer to process above     (1O7)
  • 4. Wrap-up / Announcements - (co-chairs)     (1O8)

Proceedings     (1P)

IM Chat Transcript captured during the session    (1P2)

see raw transcript here.     (1P2A)

(for better clarity, the version below is a re-organized and lightly edited chat-transcript.)     (1P2B)

Participants are welcome to make light edits to their own contributions as they see fit.     (1P2C)

-- begin in-session chat-transcript --     (1P2D)

Ontology Summit 2012: Session-10, Thursday 2012-03-15     (1P2F)

Summit Theme: Ontology Summit 2012: "Ontology for Big Systems"     (1P2G)

Track (3) Title: Challenge: Ontology and Big Data     (1P2H)

Session Topic: Big Data Developing Challenges     (1P2I)

Session Chairs: Ms. Mary Brady (NIST) and Mr. Ernie Lucier (NCO/NITRD)     (1P2J)

Panelists:     (1P2K)

Proceedings:     (1P2U)

DeborahMacPherson: Hi Everyone! Interesting discussion this all has been. I have several entire     (1P2AAP)

threads set aside to read again fully     (1P2AAQ)

Steve Ray: @Tim Finin: Very cool. Is there a way to point your GOR tool to other SPARQL endpoints?     (1P2AAR)

Leo Obrst: @Tim: concerning Varish Mulwad's research (inferring semantics of tables), was Formal     (1P2AAS)

Concept Analysis considered?     (1P2AAT)

Tim Finin: @leo --- No, not to my knowledge. Good idea. We'll look into it and think about its value     (1P2AAU)

for this problem.     (1P2AAV)

Joseph Tennis: This has been GREAT! Sorry I was late. And sadly, I'll have to leave early Looking     (1P2AAAB)

forward to participating more in the near future!     (1P2AAAC)

Nicola Guarino: Folks, unfortunately I have a car emergency, I have to leave. Perhaps I'll manage to     (1P2AAAF)

connect again in 40 minutes or so, not sure. Sorry missing Mario's presentation.     (1P2AAAG)

Ernie Lucier: Is permission to use social networks required?     (1P2AAAI)

Harold Boley: @Dr. Kim, in Real-world Awareness Computing, could the     (1P2AAAJ)

Observation / Perception / (Communication) / Action sequence be formalized using     (1P2AAAK)

Event / Condition / Action rules?     (1P2AAAL)

Kyoungsook Kim: currently, I try to use a rule-based language.     (1P2AAAM)

Harold Boley: Maybe the premises of datalog rules need to be partitioned into Event and Condition     (1P2AAAO)

Harold Boley: Events are 'sensed' as external observations. Conditions 'test' the internal knowledge     (1P2AAAQ)

Bob Schloss: Just an observation -- many people are working on highly scalable triplestores, some     (1P2AAAT)

with interesting partitioning and distribution and federation functions. We might sometime convene a     (1P2AAAU)

panel with all of these people showing what they did. It is not just graph stores that are advancing     (1P2AAAV)

-- the entire NoSQL movement is starting to develop various interesting strategies, in which some of     (1P2AAAW)

the classic ACID properties are slightly relaxed.     (1P2AAAX)

Ali Hashemi: The pdf version of this presentation is not rendering appropriately for me. Is it just     (1P2AAAY)

Ali Hashemi: @Kyoungsook - I downloaded the file and opened it in Reader - it works ok there, I think     (1P2AAAAB)

my browser's pdf reader doesn't show it correctly.     (1P2AAAAC)

Amanda Vizedom: @MikeFolk - HDF5 is new to me, so I find I have some "what *is* HDF5?" questions     (1P2AAAAE)

below the level of your talk. I see that at, there are links for "What     (1P2AAAAF)

is HDF5?" "Questions (FAQ)," and "HDF5 Tutorial" links. Would you recommend these as the best source     (1P2AAAAG)

for an overview of the fundamentals of HDF5?     (1P2AAAAH)

DavidOrloff: Sorry need to run to a meeting - if anyone is looking for data to work with that is     (1P2AAAAI)

already tagged with ontologies (9 in use) look at and contact me     (1P2AAAAJ)

DavidOrloff at dorloff[at] - thanks for letting me sit in - I will be back for future calls.     (1P2AAAAK)

Peter P. Yim: @MarioPaolucci - how does the project mitigate between its desire to be "open" versus it's     (1P2AAAAM)

dependencies (ref. your slides) on commercial products (like skype, facebook, etc. ... which are     (1P2AAAAN)

usually non-open)     (1P2AAAAO)

Mario Paolucci: @PeterYim: There are several strategies possible. We plan to build alternative data     (1P2AAAAP)

sources and provide access to them. We will not depend on commercial platforms (the names in the     (1P2AAAAQ)

slide were there more as an examples of changes brought about by technology), but we hope to create     (1P2AAAAR)

our own data sharing platform, where user themselves authorize access to their data. In a sense, we     (1P2AAAAS)

hope to convince people to reclaim access to their data.     (1P2AAAAT)

Mario Paolucci: @PeterYim: We think users would be happier to share data with a privacy preserving,     (1P2AAAAU)

non profit project, but it's a risky bet, I agree.     (1P2AAAAV)

Peter P. Yim: @Mario - thank you ... but then, the challenge comes in the form of how (and if at all)     (1P2AAAAW)

one can build out a user base of hundreds of millions of people (like the success some of these     (1P2AAAAX)

commercial social network platforms have achieved)     (1P2AAAAY)

Harold Boley: @MarioPaolucci, what would be the initial steps for moving from a Strongly Coupled     (1P2AAAAZ)

System to a Weakly Coupled System?     (1P2AAAAAA)

Mario Paolucci: @Harold: I can only provide stylized examples; it depends from the specific problem.     (1P2AAAAAB)

But of course you can hardly intervene on the self-organization, so what can be done is changing the     (1P2AAAAAC)

terrain where things happen. The examples that come to my mind is the roundabout instead of the     (1P2AAAAAD)

intersection; or in a sand pile model, breaking up the table so that cascades remain limited.     (1P2AAAAAE)

Harold Boley: @Mario, You could 'overlay' the roundabout -- with 4 quarter-circle 'bypasses' -- over     (1P2AAAAAF)

the intersection, so at least to help those not in the center of the congestion.     (1P2AAAAAG)

Mario Paolucci: @Harold:Exactly. Also, adding new dimensions help - either by digging a tunnel, or -     (1P2AAAAAH)

better - providing car with vertical mobility. Think if this as a metaphor - new dimensions are     (1P2AAAAAI)

easier to create in virtual worlds, of course     (1P2AAAAAJ)

Mario Paolucci: oops. I forgot to put the links.     (1P2AAAAAK)

Ursula Kattner: The link for the Materials Genome Initiative is     (1P2AAAAAO)

DeborahMacPherson: Interested in discussing the differences between calculated and measured values     (1P2AAAAAR)

referred to by the speaker just now     (1P2AAAAAS)

Ursula Kattner: @DeborahMacPherson: Measured values have a confidence resulting from the error of the     (1P2AAAAAT)

measurement, calculated values have no such error. However, a confidence for these data is needed to     (1P2AAAAAU)

properly judge them in the context of data that describe a material.     (1P2AAAAAV)

Mary Brady: Please, if you have questions, post them here...we'll be sure to engage the speakers in     (1P2AAAAAAA)

answering the questions over e-mail     (1P2AAAAAAB)

Matthew Hettinger: @Edin, What open source products are you using?     (1P2AAAAAAC)

Mary Brady: In particular a number of technologies and use cases for BIG DATA have been presented     (1P2AAAAAAE)

this afternoon. Any thoughts on potential uses for Ontology?     (1P2AAAAAAF)

Mario Paolucci: @Mary: We have ontology components in all parts of the FuturICT architecture, of     (1P2AAAAAAG)

course. Nicola Guarino knows more about them. But we have a critical need of ontologies that allows     (1P2AAAAAAH)

the different components to communicate - think of aligning models and simulation results along     (1P2AAAAAAI)

disciplines (sociology, complex science) and along levels of detail (individual agents,     (1P2AAAAAAJ)

organizations, groups, etc.) There should be a world of modeling component in which ontology is very     (1P2AAAAAAK)

Mary Brady: Any thoughts on the integration of ontology components with output from machine learning     (1P2AAAAAAAE)

Frank Olken: @MaryBrady I recall that some folks have suggest using ontologies to suggest concepts to     (1P2AAAAAAAG)

Mary Brady: @FrankOlken Yes, here at NIST we have used combination techniques between ontologies and     (1P2AAAAAAAI)

machine learning. Simple queries can sometimes take days to complete.     (1P2AAAAAAAJ)

Frank Olken: Nearly every machine learning algorithm is available under R.     (1P2AAAAAAAK)

Frank Olken: There are versions of R that run on clouds with Hadoop.     (1P2AAAAAAAL)

Frank Olken: There is recent work at IBM and Univ. of Wisconsin on parallel implementation of     (1P2AAAAAAAM)

stochastic gradient descent for machine learning.     (1P2AAAAAAAN)

Doug Foxvog: @Mario You discuss using "Crowd sourcing" and "citizen science" for a platform for     (1P2AAAAAAAP)

economic and political participation. People on different sides of various issues would have     (1P2AAAAAAAQ)

competing "science". How would you deal with this?     (1P2AAAAAAAR)

Nicola Guarino: @Doug: here is exactly one of the roles of ontologies in this project: exposing     (1P2AAAAAAAS)

disagreements about different opinions...     (1P2AAAAAAAT)

Nicola Guarino: @Doug: the point is *understanding* the different models, not necessarily forcing     (1P2AAAAAAAU)

them to align one each other     (1P2AAAAAAAV)

Doug Foxvog: @Nicola So long as the different theories/models are kept separate, i strongly agree.     (1P2AAAAAAAW)

The problem i saw was with an "open" system which would allow people to modify theories that they     (1P2AAAAAAAX)

didn't create.     (1P2AAAAAAAY)

Nicola Guarino: @Doug: You are right. Definitely people shouldn't be allowed to modify things at     (1P2AAAAAAAZ)

their ease... especially if the underlying assumptions are not shared...     (1P2AAAAAAAAA)

Peter P. Yim: @KyoungsookKim - how effectively did the systems cited in your use cases turn out (in real     (1P2AAAAAAAAC)

life) ... were there metrics available?     (1P2AAAAAAAAD)

Amanda Vizedom: Dr. Kim, someone responded to my G+ posting about your presentation by mentioning     (1P2AAAAAAAAE)

evacuation response research such as some at Univ. of Minnesota (     (1P2AAAAAAAAF)

in itself valuable, would relate to your use case as a contribution to *one* of the areas of     (1P2AAAAAAAAI)

computation involved in the response. It seems to me that what makes your use case such a Grand     (1P2AAAAAAAAJ)

Challenge type case is that it brings together a variety of such areas, including route-planning and     (1P2AAAAAAAAK)

information fusion across very heterogenous sensor and information types and disaster surveillance     (1P2AAAAAAAAL)

over networks. Do you agree?     (1P2AAAAAAAAM)

Amanda Vizedom: @KyoungsookKim - I should add that I think it's really a very good Grand Challenge,     (1P2AAAAAAAAN)

for a few reasons, including that it is so well grounded in a real need *and* real, existing data     (1P2AAAAAAAAO)

environments, and success has such clear benefits.     (1P2AAAAAAAAP)

Ernie Lucier: @Dr. Kim, Is permission to use social networks required or a problem?     (1P2AAAAAAAAQ)

Kyoungsook Kim: using social network, we don't try to use personal information itself. We aggregate a     (1P2AAAAAAAAR)

group of messages and extract trend information or changing information.     (1P2AAAAAAAAS)

  • Further Question & Remarks - please post them to the [ ontology-summit ] listserv     (1P2AAAAAAAAX)
    • all subscribers to the previous summit discussion, and all who responded to today's call will automatically be subscribed to the [ ontology-summit ] listserv     (1P2AAAAAAAAX1)
    • if you are already subscribed, post to <ontology-summit [at]>     (1P2AAAAAAAAX2)
    • (if you are not yet subscribed) you may subscribe yourself to the [ ontology-summit ] listserv, by sending a blank email to <ontology-summit-join [at]> from your subscribing email address, and then follow the instructions you receive back from the mailing list system.     (1P2AAAAAAAAX3)

Audio Recording of this Session     (1Q)

Additional Resources     (1R)

For the record ...     (1R5)

