Ontolog Forum
OpenOntologyRepository: "OOR for Big Data" Workshop-I - Tue 2012_08_14
Topic: "OOR for Big Data" - Brainstorm Session
Session Chair: MikeDean (OOR; Raytheon-BBN)
Archives
- Abstract
- Agenda
- Prepared slides
- Audio recording of the session ... [ 1:27:43 ; mp3 ; 10.04 MB ]
- Transcript of the online chat during the session
- Additional Resources
Conference Call Details
- Date: Tuesday, 14-Aug-2012
- Start Time: 8:30am PDT / 11:30am EDT / 5:30pm CEST / 15:30 UTC
- ref: World Clock
- Expected Call Duration: ~1.5 hours
- Dial-in:
- Phone (US): +1 (206) 402-0100 ... (long distance cost may apply)
- ... [ backup nbr: (415) 671-4335 ]
- Skype: joinconference ... (generally free-of-charge, when connecting from your computer)
- when prompted enter PIN: 141184#
- Phone (US): +1 (206) 402-0100 ... (long distance cost may apply)
- Shared-screen support (VNC session), if applicable, will be started 5 minutes before the call at: http://vnc2.cim3.net:5800/
- view-only password: "ontolog"
- if you plan to be logging into this shared-screen option (which the speaker may be navigating), and you are not familiar with the process, please try to call in 5 minutes before the start of the session so that we can work out the connection logistics. Help on this will generally not be available once the presentation starts.
- people behind corporate firewalls may have difficulty accessing this. If that is the case, please point your browser directly to the slides (where applicable) and running them locally. The speaker(s) will prompt you to advance the slides during the talk.
- In-session chat-room url: http://webconf.soaphub.org/conf/room/oor_20120814
- instructions: once you got access to the page, click on the "settings" button, and identify yourself (by modifying the Name field from "anonymous" to your real name, like "JaneDoe").
- You can indicate that you want to ask a question verbally by clicking on the "hand" button, and wait for the moderator to call on you; or, type and send your question into the chat window at the bottom of the screen.
- thanks to the soaphub.org folks, one can now use a jabber/xmpp client (e.g. gtalk) to join this chatroom. Just add the room as a buddy - (in our case here) oor_20120814@soaphub.org ... Handy for mobile devices!
- Discussions and Q & A:
- Nominally, when a presentation is in progress, the moderator will mute everyone, except for the speaker.
- To un-mute, press "*7" ... To mute, press "*6" (please mute your phone, especially if you are in a noisy surrounding, or if you are introducing noise, echoes, etc. into the conference line.)
- we will usually save all questions and discussions till after all presentations are through. You are encouraged to jot down questions onto the chat-area in the mean time (that way, they get documented; and you might even get some answers in the interim, through the chat.)
- During the Q&A / discussion segment (when everyone is muted), If you want to speak or have questions or remarks to make, please raise your hand (virtually) by clicking on the "hand button" (lower right) on the chat session page. You may speak when acknowledged by the session moderator (again, press "*7" on your phone to un-mute). Test your voice and introduce yourself first before proceeding with your remarks, please. (Please remember to click on the "hand button" again (to lower your hand) and press "*6" on your phone to mute yourself after you are done speaking.)
- RSVP to peter.yim@cim3.com appreciated, ... or simply just by adding yourself to the "Expected Attendee" list below (if you are a member of the team.)
- This session, like all other Ontolog events, is open to the public. Information relating to this session is shared on this wiki page: http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_14
- Please note that this session may be recorded, and if so, the audio archive is expected to be made available as open content, along with the proceedings of the call to our community membership and the public at-large under our prevailing open IPR policy.
Attendees
- Attended:
- Expecting:
-
- ... if you are coming to the meeting, please add your name above (plus your affiliation, if you aren't already a member of the community) above, or e-mail <peter.yim@cim3.com> so that we can reserve enough resources to support everyone's participation. ...
- Regrets:
- ...
Agenda Ideas
please insert any additional items below (along with your name for follow-up purposes)
- we need a few minutes before going into the subject discussion to tie down a few event dates
- ...
Abstract
Topic: "OOR for Big Data" - Brainstorm Session
This is a following up session from the presentation our team made on the case of "Leveraging OOR in Big Open Data" at the virtual panel session on ConferenceCall_2012_05_17.
We will use this workshop to strategize and brainstorm on how we should approach this very important initiative going forward.
The plan for this session is to use the time of this session as fairly open discussion about use of OOR for Big Data. There will not be formal presentations (we are still a bit early for that). The chair will try to seed some discussion topics.
Agenda
Topic: "OOR for Big Data" - Brainstorm Session
- Session Format: this is a virtual session conducted over an augmented conference call
- 1. Opening - chair ... [ slides ]
- 1.1 participants self-introduction (if the number of participants is less than 20) [max 0.5 min. each]
- 2. Framing the issues - chair
- 3. Q & A and open discussion [ All ] -- please refer to process above
- 4. Any other business (possibly move this up, and get it done with first)
- Tying down event dates for ref.
- OOR Architecture & API Workshop-XIII - Co-chairs: Ken Baclawski & Todd Schneider
- OOR Metadata Workshop-VIII - Chair: Michael Grüninger
- OOR Funding-I - Chair: Ken Baclawski - "rethink strategy" - 2012_08_21 or 28
- OOR Code Development-IX - Chair: Mike Dean - Tue 2012_09_16
- OOR Infrastructure-II - Chair: Peter P. Yim - ??
- Tying down event dates for ref.
- 5. Summary / Next steps / Announcements - (chair)
Proceedings
Please refer to the above
IM Chat Transcript captured during the session
see raw transcript here.
(for better clarity, the version below is a re-organized and lightly edited chat-transcript.)
Participants are welcome to make light edits to their own contributions as they see fit.
-- begin in-session chat-transcript --
[08:22] Peter P. Yim: Welcome to the
"OOR for Big Data" Workshop-I - Tue 2012_08_14
Topic: "OOR for Big Data" Brainstorm Session
Session Chair: Mike Dean (OOR; Raytheon-BBN)
Session page: http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_14
Mute control: *7 to un-mute ... *6 to mute
Can't find Skype Dial pad?
- for Windows Skype users: Can't find Skype Dial pad? ... it's under the "Call" dropdown menu as "Show Dial pad"
- for Linux Skype users: please stay with (or downgrade to) Skype version 2.x for now
(as a Dial pad seems to be missing on Linux-based Skype v4.x for skype-calls.)
Proceedings:
[08:37] Peter P. Yim: == Mike Dean starting the session off with his intro slides ...
[08:41] Todd Schneider: [ref. Mike's slides #4 & 5] How do these supporting technologies fit into
overall IT/data architectures?
[08:47] Peter P. Yim: == Q&A and Open Discussion ...
[08:48] anonymous morphed into Elizabeth Florescu
[08:49] Terry Longstreth: Mike mentioned RDF from SQL productions; Terry asked if that included
capturing the logic from constraints, triggers (Mike added views).
[08:51] Mike Bennett: How do you manage the relationship between the logical data model which
corresponds to the (e.g. SQL) data source, and the ontology of the domain. How do we distinguish an
ontology which is an RDF model of a data model design, versus an ontology of the real things?
[08:53] Mike Bennett: That is, how to formalize the model theoretic relationship between the elements
in the RDF model, and the things which they represent (data elements v things).
[08:54] Peter P. Yim: our opportunity (and challenge, of course) could be in: taking where the Linked
Data people leave off (say, having extracted a vocabulary on the dataset) and come up with the
ontology and the value added services that OOR is poised to provide ... of course, we should look at
starting from the dataset as well (not just from the vocabulary)
[08:58] Mike Bennett: Suggestion: in extracting information from large datasets, is there a role for
an "Ontology of the data" and a clear way of distinguishing this from the ontology of the subject
matter itself?
[09:01] Michael Grüninger: @MikeBennett: Do you see a role for an "ontology of the data"? What kinds
of concepts and relations do you see in such an ontology? Would it be domain-independent?
[09:04] Mike Bennett: This is what I'm wondering. It's all too easy for someone to take a logical
model (which has been designed, rightly), converting it into RDF/OWL and saying "Lo, an ontology". I
think there must be a role for it in data extraction, but also a clear difference to ontologies of
the subject matter. With perhaps an RDF/OWL mapping ontology between the two?
[09:06] Mike Bennett: @Michael PS I suspect that the basic concepts such a model would be built from,
i.e. the top level classes, might be the constructs that exist in the data modeling language, e.g.
data table, join table, UML class, UML attribute and so on? I don't know, I'm still ruminating on
this.
[09:03] Peter P. Yim: candidate data that we might get our hands on and communities we can collaborate
with are probably in the domains of: (i) government data; (ii) geospatial / geo-science; (iii)
biomedical (iv) standards
[09:05] Peter P. Yim: a mini-series is coming up shortly - see:
http://ontolog.cim3.net/cgi-bin/wiki.pl?EarthScienceOntolog ... the kick-off session for the series
is coming up next week (Thu 2012.08.23)
[09:06] Terry Longstreth: Big data characterized by Size, by Complexity, by SxC?
[09:08] Terry Longstreth: A characteristic of big systems is the fuzziness of the boundaries; does
the same hold for big data?
[09:10] Todd Schneider: Ken, when the phrase 'big data' is used I haven't seen qualification to the
number of sources. Perhaps there's an implicit assumption of a single source.
[09:11] Ken Baclawski: When we talk about Big Data and OOR, there are two different problems: a
relatively small number of large complex ontologies (as in BioPortal which has ontologies with
millions of concepts) or a very large number (on the order of millions) of relatively simple
ontologies (e.g., the fields of a CSV file).
[09:19] Todd Schneider: Ken, Okay there may be performance issues when the number of ontologies
becomes large.
[09:18] Terry Longstreth: Challenge for OOR would seem to be more towards complexity rather than size
of a Big Data environment
[09:20] Mike Bennett: Open linked data marketing issue: so all these governments that are putting out
RDF linked data on the basis of the benefits of semantics: are they aware of the available
ontologies (per OOR) so that they start to use existing ontologies as the conceptual model from
which to structure new and future open linked data outputs.
[09:20] Mike Bennett: This also requires an awareness of the role of an ontology AS a conceptual
model - many folks are simply not aware of basic top down modeling best practice.
[09:22] Michael Grüninger: Two different kinds of applications of OOR: 1) using ontologies within OOR
together with existing data sets e.g. integration, decision support. 2) using ontologies within OOR
to design new data sets or redesign existing data sets
[09:27] Mike Dean: As part of "moving to the cloud", many organizations are moving from relational
databases to map/reduce frameworks such as Hadoop. Can ontologies be used to assist with or guide
such migration?
[09:30] Todd Schneider: Have to go. Cheers.
[09:37] Ken Baclawski: Big Data also involves format conversion issues as well as semantics
(e.g., integer vs floating point, image formats).
[09:39] Mike Bennett: OOR: would not include metadata about data formats (how could it?) but there is
a role for some demonstration of description of how to use this stuff, as part of the usage of OOR
in that scenario.
[09:39] Ken Baclawski: Clarify these terms: data registry, data repository, ontology registry,
ontology repository.
[09:40] Mike Bennett: Also clarify how these different moving parts may be framed in terms of the
Zachman Framework or some similar formal development framework.
[09:40] Terry Longstreth: The loss of meaning in data conversions should be part of the ontology
[09:41] Terry Longstreth: ...perhaps a separate micro-theory about data formats and lossy vs.
lossless conversions
[09:44] Mike Bennett: Incompatible data sources: what ontologies are for.
[09:47] Mike Dean: http://icpsr.umich.edu is a large repository of social science data
[09:47] Ken Baclawski: I suggest as an action item: capture the use cases we just identified.
[09:48] Mike Dean: Most domains/projects seem to have their own data registries
[09:58] Terry Longstreth: Are we agreed that Big Data for OOR is primarily an issue of complexity of
the target relationships?
[10:00] Terry Longstreth: Apparently no consensus yet
[10:01] Mike Bennett: Or is it "Data that is Big" v "Big data architecture" i.e. Hadoop / Mapreduce.
Both seemed relevant according to today's conversation.
[10:04] Peter P. Yim: as long as we are not arguing about what "Big Data" is ... and we know our
objective is to tackle how OOR can "serve" Big Data, I think we are fine (at least for now; until we
need to tackle at finer granularity)
[10:12] Peter P. Yim: assuming we have consensus on putting a focus on "OOR for Big Data" (along with
what we have been doing so far with OOR), we might want to: (a) identify the data, use case(s) and
one or two community(ies) we want to work closely with, to take this effort to the next level, (b)
figure out what we need to be doing differently (from this point onwards) for OOR
[10:21] Peter P. Yim: Following event dates confirmed
(ref. http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_07#nid3DJ5 ... and other events already scheduled)
(... incorporating consensus arrived with email exchanges after the session as well)
- OOR Funding-I - Chair: Ken Baclawski - "rethink strategy" - Tue 2012_08_21
- No meeting on Tue 2012_08_28
- OOR regular monthly team meeting - Tue 2012.09.04
- OOR Architecture & API Workshop-XIII - Co-chairs: Ken Baclawski & Todd Schneider - "use cases" - Tue 2012_09_11
- OOR Content Workshop-IV - Co-chairs: Michael Grüninger and Mike Dean - "Capturing FOIS Ontology Content" - Tue 2012_09_18
- No meeting on Tue 2012_09_25
- No meeting on Tue 2012_10_02
- OOR regular monthly team meeting - Tue 2012.10.09
- OOR Metadata Workshop-VIII - Chair: Michael Grüninger - will do "Mapping" and/or "metadata in OOR for Big Data" - Tue 2012_10_16
- OOR Code Development-IX - Chair: Mike Dean - Tue 2012_10_23
- OOR Infrastructure-II - Chair: Peter P. Yim - tba (will schedule this later; possibly after
the next major release of the BioPortal vm appliance)
[10:04] Peter P. Yim: great session!
[10:05] Peter P. Yim: -- session ended: 10:03am PDT --
-- end of in-session chat-transcript --
- Further Question & Remarks - please post them to the [ oor-forum ] listserv
- all subscribers to the previous summit discussion, and all who responded to today's call will automatically be subscribed to the [ oor-forum ] listserv
- if you are already subscribed, post to <oor-forum [at] ontolog.cim3.net>
- Also, if you are not yet a member of the Ontolog community (i.e. if you are not yet subscribed to the [ontolog-forum] list) , you might consider become a member of the community. Please refer to membership details here.
- Next Meetings:
- Tying down event dates for ref.
- OOR Architecture & API Workshop-XIII - Co-chairs: Ken Baclawski & Todd Schneider - "use cases" - Tue 2012_09_11
- OOR Metadata Workshop-VIII - Chair: Michael Grüninger - will do "Mapping" and/or "metadata in OOR for Big Data" - Tue 2012_09_25
- OOR Funding-I - Chair: Ken Baclawski - "rethink strategy" - Tue 2012_08_21 (MikeBennett voiced his preference for 8/21; most people are ok with either dates.)
- OOR Code Development-IX - Chair: Mike Dean - Tue 2012_09_16
- OOR Infrastructure-II - Chair: Peter P. Yim - tba
- No meeting on Tue 2012_08_28
- Tying down event dates for ref.
Additional Resources
- The OpenOntologyRepository Initiative - http://www.oor.net
- Index page to the various OOR instances - http://oor.net
- Our "Leveraging OOR in Big Open Data" presentation at the virtual panel session on ConferenceCall_2012_05_17
- Ontology Summit 2012 - "Ontology for Big Systems"
- OntologySummit2012_Communique
- 2012_02_09 - Thursday: Ontology Summit 2012 session-05 - Track-3: "Meeting Big Data Challenges through Ontology - I & II" - Co-chairs: Ernie Lucier & Mary Brady - Panelists: Barry Smith, ChrisMusialek-JeanneHolm, BryanThompson-MikePersonick, James Kirby - ConferenceCall_2012_02_09
- 2012_02_16 - Thursday: Ontology Summit 2012 session-06 - Track-4: "Large-Scale Domain Applications - I: Energy, Government and Geography" - Co-chairs: Steve Ray & Trish Whetzel - Panelists: - Andrew Crapo, Krzysztof Janowicz, Bruce Bauman, Mills Davis - ConferenceCall_2012_02_16
- 2012_03_08 - Thursday: Ontology Summit 2012 session-09 - Track-4: "Large-Scale Domain Applications - II: Biomedical, earth & environmental science & engineering" - Co-chairs: Trish Whetzel & Steve Ray - Panelists: David Price, Mike Kellen, Damian Gessler, Blazej Bulka, Ilya Zaslavsky, Line Pouchard - ConferenceCall_2012_03_08
- 2012_03_15 - Thursday: Ontology Summit 2012 session-10 - Track-3: "Challenge: Ontology and Big Data - III" - Co-chairs: Mary Brady & Ernie Lucier - Panelists: Tim Finin, Kyoungsook Kim, Mike Folk, Mario Paolucci, Ursula Kattner, Edin Muharemagic - ConferenceCall_2012_03_15
- Ontology Summit 2011 - "Making the Case for Ontology"
- OntologySummit2009 - "Toward Ontology-based Standards"
- Ontology Summit 2008 - "Toward An Open Ontology Repository"
- OASIS Open Data Protocol (OData) TC - https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=odata
For the record ...
How To Join (while the session is in progress)
- 1. Call in from a phone or from skype: http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_14#nid3DSU
- 2. Open chat in a new browser window: http://webconf.soaphub.org/conf/room/oor_20120814
- 3. Download the prepared slides here: http://ontolog.cim3.net/file/work/OpenOntologyRepository/OOR-for-BigData/OOR-for-BigData--MikeDean_20120814.pdf
- or, 3.1 (access our shared-screen vnc server, if you are not behind a corporate firewall)