Ontolog Invited Speaker Presentation - Dr. Ramanathan V. Guha - Thu 2011.12.01

Session Chair: Dr. SteveRay (CMU)

Invited Speakers: Dr. R V Guha (Google, schema.org)

Presentation Title: "schema.org"

Archive:
- [ Agenda & Proceedings ]
- [ Abstract ]
- there will not be any slides for this talk
- [ audio recording of the session ] [ 1:21:31 ; mp3 ; 9.33 MB ]
- [ Transcript of the online chat session ] during the panel discussion

Agenda & Proceedings

Session Format and Agenda:
- this will be virtual session over a phone conference setting, augmented by in-session chat and shared computer screen support

Introduction of the invited speakers - session chair: Steve Ray
Presentation by our invited speakers - Ramanathan Guha (30~45 min.)
Q&A and Open discussion (30~45 min.) [Kindly identify yourself before speaking.]

Presentation Title: "Schema.org"

[ Dr. Ramanathan V. Guha ]

Abstract:

Schema.org provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes it easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, search engines have come together to provide a shared collection of schemas that webmasters can use.

This session will be structured as a Q&A session where Google Fellow Ramanathan Guha will provide a brief introduction to the Schema.org activity and then answer your questions regarding the relation between this work and the broader ontology world.

About the Speakers:

Speaker Bio (with credit to Wikipedia) Ramanathan V. Guha (1965) is an Indian computer scientist. He graduated with B.Tech (Mechanical Engineering) from Indian Institute of Technology Madras, MS (Mechanical engineering) from University of California Berkeley and Ph.D (Computer science) from Stanford University. Since May 2005, he has been working at Google.

Guha was one of the early co-leaders of the Cyc Project where he worked from 1987 through 1994 at Microelectronics and Computer Technology Corporation. He was responsible for the design and implementation of key parts of the Cyc system, including the CycL knowledge representation language, the upper ontological layers of the Cyc Knowledge Base and some parts of the original Cyc Natural Language understanding system. Leaving what became Cycorp, Guha founded Q Technology, which created a database schema mapping tool called Babelfish. In 1994, he moved to work at Apple Computer, reporting to Alan Kay, where he developed the Meta Content Framework (MCF) format. In 1997 he joined Netscape Corporation where together with Tim Bray, he created a new version of MCF that used the XML language and which became the main technical precursor to W3C's Resource Description Framework (RDF) standard. Guha also contributed to the "smart browsing" features of Netscape 4.5 and was instrumental in Netscape's acquisition of the Open Directory Project. In March 1999, he created the first version of RSS as part of Netscape's personalized home page project. In 1999 he left Netscape and in May co-founded Epinions where he worked until 2000. Guha founded Alpiri in late 2000 which created TAP, a semantic web application and knowledge base. In 2002, he became a researcher at IBM Almaden Research Center. In 2005 Guha joined Google. He currently leads development of Google Custom Search and is one of the champions of the current Schema.org activity being promoted by Google, Microsoft Bing, Yahoo! and Yandex.

Proceedings

Transcript of the online chat during the session

See raw transcript here.

(for better clarity, the version below is a re-organized and lightly edited chat-transcript.)

Participants are welcome to make light edits to their own contributions as they see fit.

Steve Ray: Welcome, Guha. Glad you made it!

Guha: thanks

Guha: Dan Brickley will be joining me in talking

Steve Ray: OK. Noted. I will start with an introduction, then hand things over to you.

danbri just joined

danbri thanks Guha

Peter P. Yim: -- session formally started 9:38am PST --

danbri: (re old Guha / Bray spec, see http://www.w3.org/Submission/1997/8/ )

danbri: -> http://www.w3.org/TR/WD-rdf-syntax-971002/

danbri: nitpic "RDFa Lite" rather than "RDF Lite"; it's about the in-html notation

danbri: Working Draft out next week

danbri: discussion of http://en.wikipedia.org/wiki/ISO_8000

http://www.dataforge.com/wpblog/index.php/industry-news/iso-22745-standard-based-exchange-of-product-data/

Steve Ray: Peter Benson: ISO 22745 is a set of standard tags with many entries already.

Peter P. Yim: Guha: target audience for schema.org is the "webmasters"

danbri: example: http://schema.org/Movie

Doug Foxvog: schema.org could use classification for PhysicalObject. A common superclass Agent of Person & Organization would be useful .

danbri: http://www.rssboard.org/rss-0-9-0

Leo Obrst: S-expressions in Lisp.

Peter P. Yim: Steve Ray paraphrasing JohnSowa's questions for Guha - ref: http://ontolog.cim3.net/forum/ontolog-forum/2011-11/msg00141.html

danbri: so RDF '97 was PICS-NG, which used s-expressions: http://www.w3.org/TR/NOTE-pics-ng-metadata

danbri: (then XML happened)

Kingsley Idehen: John's actual post: http://ontolog.cim3.net/forum/ontolog-forum/2011-11/msg00141.html

Joel Bender: (and then N3 happened)

Kingsley Idehen: Then Linked Data happened

danbri: (and then JSON happened...)

Doug Foxvog: XML is not restricted to triples. Why was/is RDF so restricted?

Kingsley Idehen: Yes, Linked Data brings it back home to simplicity

Joel Bender: (and now JSON-LD is happening ... maybe)

Kingsley Idehen: Yes, but Linked Data is agnostic re. EAV/SPO based 3-tuples

K Goodier: Keeping things simple and delivering value

Kingsley Idehen: and via HTTP we can negotiate representation

Frank Chum: Doug, I like RDF for its simplicity and not as restricted

Kingsley Idehen: Good example of this all working, via Linked Data simplicity: http://wiki.goodrelations-vocabulary.org/Microdata

Kingsley Idehen: Yes, we have to "hold our noses" re. large scale adoption . +1

Nicola Guarino: usual problem with skype, sorry

Kingsley Idehen: Here is a link to a note showing how Schema.org mapped to DBpedia leads to network effects: https://plus.google.com/112399767740508618350/posts/ck2yhgTWxtD

Kingsley Idehen: A specific page showing LOD Cloud instance data based on Schema.org cross links: http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fschema.org%2FLandmarksOrHistoricalBuildings&urilookup=1

Steve Ray: @Nicola: OK, I'll try you again after Ali is done with his second question, if you raise your hand again.

Kingsley Idehen: Final page showing links between Schema.org and DBpedia (and other vocabularies which appear as you follow-your-nose through the Linked Data): http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fschema.org%2FLandmarksOrHistoricalBuildings&p=1&lp=89&op=-1&last=&gp=1

danbri: on the 'do we need rdf' question, .... we see two trends: (1) people who use RDF, find frustration with the fiddly details of the spec (datatypes, etc.). Perhaps such things are just inherently annoying. There needs to be a rule, but the rule is arbitrary. (2) people who don't use RDF explicitly, often drift towards a data model that is very RDF-like, because RDF didn't appear from nowhere. Graph-shaped data is a very common pattern (cf. Kingsley on EAV). Hence all recent talk on 'social graph', 'interest graph', etc.

Kingsley Idehen: Methinks: Schema.org and Linked Data have a mutually beneficial relationship that in effect fans out to adding more semantic structure to links (actually relations) on the WWW. Schema.org delivers immediate and palpable value

Nicola Guarino: @Steve: sorry, I am not able to talk through skype, too bad

Peter P. Yim: @Nicola: please type out your question on the chat

Peter P. Yim: schema.org - as Dan Brickley puts it - characterized by a small working group, consensus, ability to move and make decisions quickly

Nicola Guarino: Here is the comment I wanted to make: The reason why super-simple ontologies like FOAF work is that the words are simple to understand But there are words which everybody understands, and words that are ambiguous and difficult to define or explain (e.g., service, unemployed person). It is a fact that people doing markup don't care about deep semantics of their tag. So if the goal is to get billions of pages marked up, that's fine. But what about USING these marked up pages for information integration, services mashup and so on, instead of just for search?

BOTTOM LINE: extensive tagging with little semantics may be very useful for search, but not for integration of information

Nicola Guarino: @Guha: but even for application-dependent vocabularies we sometime need very crisp formal definitions....

danbri: (re starting points of Web: http://www.w3.org/History/1989/proposal-msw.html has seeds of RDF in there too)

Nicola Guarino: Deep semantics is needed (sometimes) also for application-dependent purposes, not just for universal purposes

Adrian Walker: To go beyond search applications, some degree of NLP is unavoidable?

Doug Foxvog: I suggest that small ontologies can build on larger existing ones. Those who use them do not need to use everything from the larger ontologies. Deep ontologies would have rules and reasoning structures that are immaterial to small systems that use parts of them.

Peter Benson: our experience with ISO 8000 is that you need sufficient data to meet a defined requirement - nothing more. As requirements grow so does the depth of data.

Nicola Guarino: Besides schema.org, why not investing on a MINIMAL formal vocabulary, clarifying for instance the various notions of PART or DEPENDENCE?

Stefano Bortoli: being to narrow in the definition of schemas might end up in a higher cost of maintenance of the application after all. This is a less we should have learned from software engineering at least. So, deep thinking and generalization to some extent is necessary. Simple and easy is good in the short term, but we risk to create asbestos that will be very hard to handle in the future

Nicola Guarino: @Stefano Bortoli: +1

Doug Foxvog: Contexts can separate ontologies into subsets. Guha is talking about the problems of "an ontology of everything". Cyc developed the idea of Microtheories (but i'm not sure if it was after he left). By placing rules and relationships in such contexts (or microtheories) one can avoid many of the problems of an "ontology of everything". This becomes an issue on the Semantic Web, where triples make it hard to place statements within specific contexts.

Vlad Tanasescu: Any pointers to this ACM article?

GaryBergCross: What consideration has schema.org given to controlled natural languages? Some efforts have tried to make OWL and Common Logic easier to express.

danbri: @DougFoxvog: [http://www-formal.stanford.edu/Guha/ has 'Contexts: A Formalization and Some Applications'...

Stefano Bortoli: @Dough I don't think that anyone is really aiming at the "philosophical ontology", not in the Semantic Web at least. Indeed, the first efforts were spent in automatic ontology mappings, rather than producing semantically annotated data. Contexts are particularly complex to manage in a context-less environment such as the WEB. The less we can do, is to try to be formal in defining concepts to reduce the risk of misunderstanding.

GaryBergCross: One issue with Microtheories is when do your create a new one versus adapt an existing one.

Peter P. Yim: Guha: currently adoption is in the order of thousands of sites and billions of pages now

Steve Ray: Certainly some standards development efforts are importing existing external concepts or "ontologies" to a much greater degree today.

danbri: on re-use, one q is whether publishers/authors of instance data should bear the cost of that sharing/re-use. Mainstream RDF / SemWeb culture is to have instance data cite several different ontologies. Schema.org rather pre-packages things and offers the package as a single usable thing...

danbri: re rNews - see http://blog.schema.org/2011/09/extended-schemaorg-news-support.html for details

danbri: http://www.iptc.org/site/Home/Media_Releases/schema.org_adopts_IPTC's_rNews_for_news_markup

Roger Cutler: I don't think he said billions of pages. Thousands of sites & billions of pages means millions of pages per site, right?

danbri: (yup, we should make the various mappings to/from schema.org easier to find)

Doug Foxvog: @Gary -- You can create a new microtheory when describing a narrower field or are using multiple existing contexts, or when presenting information about a specific event or other individual.

danbri: http://wiki.creativecommons.org/LRMI/Specification_v0.5

Nicola Guarino: A couple of problems I find in the current taxonomic structure of schema.org: 1. A governmentOffice is both a place and an organization

Christopher Spottiswoode: What a privilege that was, to be able to listen in on that conversation, with all that experience! Thank you all.

Doug Foxvog: @Gary -- adapt an existing context when providing more info @ same level

Stefano Bortoli: thanks

Stefano Bortoli: bye

Peter P. Yim: Great session ... thank you Guha, Dan and everyone all for coming!

Guha: Thank you everyone

danbri: Thanks all

Peter P. Yim: -- session ended : 11:00am PST --

-- end of chat session --

... More Questions
- For those who have further questions or remarks on the topic, please post them to the [ontolog-forum] so that everyone in the community can benefit from the discourse.
- if you are not a member of the Ontolog community (meaning to say you are not subscribed to the [ontolog-forum] list) yet, we cordially invite you to join us. See our "Membership" details at: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J

Audio Recording of this Session

To download the audio recording of the session, click here
- the playback of the audio files require the proper setup, and an MP3 compatible player on your computer.
Conference Date and Time: 1-Dec-2011 9:38 ~ 11:00 am Pacific Standard Time
Duration of Recording: 1 Hour 21.5 Minutes
Recording File Size: 9.33 MB (in mp3 format)

suggestion: its best that you listen to the session while having the presentation opened in front of you. You'll be prompted to advance slides by the speaker.
Take a look, also, at the rich body of knowledge that this community has built together, over the years, by going through the archives of noteworthy past Ontolog events. (References on how to subscribe to our podcast can also be found there.)

For the record ...

How To Join (while the session is in progress)

1. Dial in with a phone and connect through skype: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2011_12_01#nid2ZLR
2. Open chat in a new browser window: http://webconf.soaphub.org/conf/room/ontolog_20111201
3. Download the speaker's presentation (slides) here: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2011_12_01#nid2ZLJ
- or, 3.1 access our shared-screen vnc server, if you are not behind a corporate firewall

Conference Call Details

Date: Thursday, 1-Dec-2011
Start Time: 9:30am PST / 12:30pm EST / 6:30pm CET / 17:30 UTC
- ref: World Clock
Expected Call Duration: ~1.5 hours

Dial-in:
- Phone (US): +1 (206) 402-0100 ... (long distance cost may apply)
  - ... [ backup nbr: (415) 671-4335 ]
- Skype: joinconference ... (generally free-of-charge, when connecting from your computer)
- when prompted enter PIN: 141184#

Shared-screen support (VNC session), if applicable, will be started 5 minutes before the call at: http://vnc2.cim3.net:5800/
- view-only password: "ontolog"
- if you plan to be logging into this shared-screen option (which the speaker may be navigating), and you are not familiar with the process, please try to call in 5 minutes before the start of the session so that we can work out the connection logistics. Help on this will generally not be available once the presentation starts.
- people behind corporate firewalls may have difficulty accessing this. If that is the case, please download the slides above (where applicable) and running them locally. The speaker(s) will prompt you to advance the slides during the talk.

In-session chat-room url: http://webconf.soaphub.org/conf/room/ontolog_20111201
- instructions: once you got access to the page, click on the "settings" button, and identify yourself (by modifying the Name field from "anonymous" to your real name, like "JaneDoe").
- You can indicate that you want to ask a question verbally by clicking on the "hand" button, and wait for the moderator to call on you; or, type and send your question into the chat window at the bottom of the screen.
- thanks to the soaphub.org folks, one can now use a jabber/xmpp client (e.g. gtalk) to join this chatroom. Just add the room as a buddy - (in our case here) ontolog_20111201@soaphub.org ... Handy for mobile devices!

Discussions and Q & A:
- Nominally, when a presentation is in progress, the moderator will mute everyone, except for the speaker.
- To un-mute, press "*7" ... To mute, press "*6" (please mute your phone, especially if you are in a noisy surrounding, or if you are introducing noise, echoes, etc. into the conference line.)
- we will usually save all questions and discussions till after all presentations are through. You are encouraged to jot down questions onto the chat-area in the mean time (that way, they get documented; and you might even get some answers in the interim, through the chat.)
- During the Q&A / discussion segment (when everyone is muted), If you want to speak or have questions or remarks to make, please raise your hand (virtually) by clicking on the "hand button" (lower right) on the chat session page. You may speak when acknowledged by the session moderator (again, press "*7" on your phone to un-mute). Test your voice and introduce yourself first before proceeding with your remarks, please. (Please remember to click on the "hand button" again (to lower your hand) and press "*6" on your phone to mute yourself after you are done speaking.)

Please review our Virtual Session Tips and Ground Rules - see: VirtualSpeakerSessionTips

RSVP to peter.yim@cim3.com appreciated, ... or simply just by adding yourself to the "Expected Attendee" list below (if you are a member of the team.)

This session, like all other Ontolog events, is open to the public. Information relating to this session is shared on this wiki page: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2011_12_01

Please note that this session may be recorded, and if so, the audio archive is expected to be made available as open content, along with the proceedings of the call to our community membership and the public at-large under our prevailing open IPR policy.

Attendees

Attended:
- Steve Ray (chair)
- R V Guha "Guha" (invited speaker)
- Dan Brickley "danbri" (discussant)
- Peter P. Yim
- Randy Kerber
- Bobbin Teegarden
- Frank Chum
- Leo Obrst
- Elizabeth Florescu
- Bob Schloss
- Doug Foxvog
- David Hau
- Bob Smith
- Gerald Radack
- Stefano Bocconi
- Vlad Tanasescu (The University of Edinburgh)
- Kurt Kirkham (Sallie Mae)
- Katherine Goodier
- Joel Bender
- Gary Bergcross
- James Sorace (HHS)
- Nicola Guarino
- Christopher Spottiswoode
- Peter Benson
- Melissa Hildebrand (Scheib) (ECCMA)
- Frank Alvidrez
- Adrian Walker
- Andreas Harth
- Lora Aroyo (VU, NL)
- Roger Cutler (Chevron)
- Ram D. Sriram
- Kingsley Idehen
- YefimZhuk
- Alessander Botti Benevides
- Ali Hashemi
- Arnaud J Le Hors
- Brian Davis
- Cirrus Shakeri
- Duane Nickull
- Kavitha Srinivas
- Mike Ward
- MyCoyne
- shenley
- Stefano Bortoli
- Ted Bashor
- Yu Lin

Expecting:
- (please add yourself to the list if you are a member of this community, or, rsvp to <peter.yim@cim3.com>)

Regrets:
- John F. Sowa (cannot attend, but has questions that he will ask via the session chair)
- Christoph Lange (traveling)
- Todd Schneider
- Martin Hepp
- Frank Olken (time conflict)
- Chris Welty
- ...

Ontolog Forum

Contents