Ontolog Invited Speaker Presentation - Dr. Ramanathan V. Guha - Thu 2011.12.01     (1)

RVGuha.jpg [ Dr. Ramanathan V. Guha ]     (1E6) provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes it easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of, search engines have come together to provide a shared collection of schemas that webmasters can use.     (1E8)
This session will be structured as a Q&A session where Google Fellow Ramanathan Guha will provide a brief introduction to the activity and then answer your questions regarding the relation between this work and the broader ontology world.     (1E9)
Speaker Bio (with credit to Wikipedia) Ramanathan V. Guha (1965) is an Indian computer scientist. He graduated with B.Tech (Mechanical Engineering) from Indian Institute of Technology Madras, MS (Mechanical engineering) from University of California Berkeley and Ph.D (Computer science) from Stanford University. Since May 2005, he has been working at Google.     (1E11)
Guha was one of the early co-leaders of the Cyc Project where he worked from 1987 through 1994 at Microelectronics and Computer Technology Corporation. He was responsible for the design and implementation of key parts of the Cyc system, including the CycL knowledge representation language, the upper ontological layers of the Cyc Knowledge Base and some parts of the original Cyc Natural Language understanding system. Leaving what became Cycorp, Guha founded Q Technology, which created a database schema mapping tool called Babelfish. In 1994, he moved to work at Apple Computer, reporting to Alan Kay, where he developed the Meta Content Framework (MCF) format. In 1997 he joined Netscape Corporation where together with Tim Bray, he created a new version of MCF that used the XML language and which became the main technical precursor to W3C's Resource Description Framework (RDF) standard. Guha also contributed to the "smart browsing" features of Netscape 4.5 and was instrumental in Netscape's acquisition of the Open Directory Project. In March 1999, he created the first version of RSS as part of Netscape's personalized home page project. In 1999 he left Netscape and in May co-founded Epinions where he worked until 2000. Guha founded Alpiri in late 2000 which created TAP, a semantic web application and knowledge base. In 2002, he became a researcher at IBM Almaden Research Center. In 2005 Guha joined Google. He currently leads development of Google Custom Search and is one of the champions of the current activity being promoted by Google, Microsoft Bing, Yahoo! and Yandex.     (1E12)

Proceedings     (1F)

See raw transcript here.     (1F2)

(for better clarity, the version below is a re-organized and lightly edited chat-transcript.)     (1F3)

Participants are welcome to make light edits to their own contributions as they see fit.     (1F4)

Steve Ray: Welcome, Guha. Glad you made it!     (1F5)

Guha: Dan Brickley will be joining me in talking     (1F7)

Steve Ray: OK. Noted. I will start with an introduction, then hand things over to you.     (1F8)

danbri just joined     (1F9)

Peter P. Yim: -- session formally started 9:38am PST --     (1F11)

danbri: nitpic "RDFa Lite" rather than "RDF Lite"; it's about the in-html notation     (1F14)

danbri: Working Draft out next week     (1F15)

Steve Ray: Peter Benson: ISO 22745 is a set of standard tags with many entries already.     (1F18)

Peter P. Yim: Guha: target audience for is the "webmasters"     (1F19)

Doug Foxvog: could use classification for PhysicalObject. A common superclass Agent of Person & Organization would be useful .     (1F21)

Leo Obrst: S-expressions in Lisp.     (1F23)

danbri: so RDF '97 was PICS-NG, which used s-expressions:     (1F25)

danbri: (then XML happened)     (1F26)

Joel Bender: (and then N3 happened)     (1F28)

Kingsley Idehen: Then Linked Data happened     (1F29)

danbri: (and then JSON happened...)     (1F30)

Doug Foxvog: XML is not restricted to triples. Why was/is RDF so restricted?     (1F31)

Kingsley Idehen: Yes, Linked Data brings it back home to simplicity     (1F32)

Joel Bender: (and now JSON-LD is happening ... maybe)     (1F33)

Kingsley Idehen: Yes, but Linked Data is agnostic re. EAV/SPO based 3-tuples     (1F34)

K Goodier: Keeping things simple and delivering value     (1F35)

Kingsley Idehen: and via HTTP we can negotiate representation     (1F36)

Frank Chum: Doug, I like RDF for its simplicity and not as restricted     (1F37)

Kingsley Idehen: Good example of this all working, via Linked Data simplicity:     (1F38)

Kingsley Idehen: Yes, we have to "hold our noses" re. large scale adoption . +1     (1F39)

Nicola Guarino: usual problem with skype, sorry     (1F40)

Kingsley Idehen: Here is a link to a note showing how mapped to DBpedia leads to network effects:     (1F41)

Steve Ray: @Nicola: OK, I'll try you again after Ali is done with his second question, if you raise your hand again.     (1F43)

Kingsley Idehen: Final page showing links between and DBpedia (and other vocabularies which appear as you follow-your-nose through the Linked Data):     (1F44)

danbri: on the 'do we need rdf' question, .... we see two trends: (1) people who use RDF, find frustration with the fiddly details of the spec (datatypes, etc.). Perhaps such things are just inherently annoying. There needs to be a rule, but the rule is arbitrary. (2) people who don't use RDF explicitly, often drift towards a data model that is very RDF-like, because RDF didn't appear from nowhere. Graph-shaped data is a very common pattern (cf. Kingsley on EAV). Hence all recent talk on 'social graph', 'interest graph', etc.     (1F45)

Kingsley Idehen: Methinks: and Linked Data have a mutually beneficial relationship that in effect fans out to adding more semantic structure to links (actually relations) on the WWW. delivers immediate and palpable value     (1F46)

Nicola Guarino: @Steve: sorry, I am not able to talk through skype, too bad     (1F47)

Peter P. Yim: @Nicola: please type out your question on the chat     (1F48)

Peter P. Yim: - as Dan Brickley puts it - characterized by a small working group, consensus, ability to move and make decisions quickly     (1F49)

Nicola Guarino: Here is the comment I wanted to make: The reason why super-simple ontologies like FOAF work is that the words are simple to understand But there are words which everybody understands, and words that are ambiguous and difficult to define or explain (e.g., service, unemployed person). It is a fact that people doing markup don't care about deep semantics of their tag. So if the goal is to get billions of pages marked up, that's fine. But what about USING these marked up pages for information integration, services mashup and so on, instead of just for search?     (1F50)

BOTTOM LINE: extensive tagging with little semantics may be very useful for search, but not for integration of information     (1F51)

Nicola Guarino: @Guha: but even for application-dependent vocabularies we sometime need very crisp formal definitions....     (1F52)

danbri: (re starting points of Web: has seeds of RDF in there too)     (1F53)

Nicola Guarino: Deep semantics is needed (sometimes) also for application-dependent purposes, not just for universal purposes     (1F54)

Adrian Walker: To go beyond search applications, some degree of NLP is unavoidable?     (1F55)

Doug Foxvog: I suggest that small ontologies can build on larger existing ones. Those who use them do not need to use everything from the larger ontologies. Deep ontologies would have rules and reasoning structures that are immaterial to small systems that use parts of them.     (1F56)

Peter Benson: our experience with ISO 8000 is that you need sufficient data to meet a defined requirement - nothing more. As requirements grow so does the depth of data.     (1F57)

Nicola Guarino: Besides, why not investing on a MINIMAL formal vocabulary, clarifying for instance the various notions of PART or DEPENDENCE?     (1F58)

Stefano Bortoli: being to narrow in the definition of schemas might end up in a higher cost of maintenance of the application after all. This is a less we should have learned from software engineering at least. So, deep thinking and generalization to some extent is necessary. Simple and easy is good in the short term, but we risk to create asbestos that will be very hard to handle in the future     (1F59)

Nicola Guarino: @Stefano Bortoli: +1     (1F60)

Doug Foxvog: Contexts can separate ontologies into subsets. Guha is talking about the problems of "an ontology of everything". Cyc developed the idea of Microtheories (but i'm not sure if it was after he left). By placing rules and relationships in such contexts (or microtheories) one can avoid many of the problems of an "ontology of everything". This becomes an issue on the Semantic Web, where triples make it hard to place statements within specific contexts.     (1F61)

Vlad Tanasescu: Any pointers to this ACM article?     (1F62)

GaryBergCross: What consideration has given to controlled natural languages? Some efforts have tried to make OWL and Common Logic easier to express.     (1F63)

danbri: @DougFoxvog: [ has 'Contexts: A Formalization and Some Applications'...     (1F64)

Stefano Bortoli: @Dough I don't think that anyone is really aiming at the "philosophical ontology", not in the Semantic Web at least. Indeed, the first efforts were spent in automatic ontology mappings, rather than producing semantically annotated data. Contexts are particularly complex to manage in a context-less environment such as the WEB. The less we can do, is to try to be formal in defining concepts to reduce the risk of misunderstanding.     (1F65)

GaryBergCross: One issue with Microtheories is when do your create a new one versus adapt an existing one.     (1F66)

Peter P. Yim: Guha: currently adoption is in the order of thousands of sites and billions of pages now     (1F67)

Steve Ray: Certainly some standards development efforts are importing existing external concepts or "ontologies" to a much greater degree today.     (1F68)

danbri: on re-use, one q is whether publishers/authors of instance data should bear the cost of that sharing/re-use. Mainstream RDF / SemWeb culture is to have instance data cite several different ontologies. rather pre-packages things and offers the package as a single usable thing...     (1F69)

Roger Cutler: I don't think he said billions of pages. Thousands of sites & billions of pages means millions of pages per site, right?     (1F72)

danbri: (yup, we should make the various mappings to/from easier to find)     (1F73)

Doug Foxvog: @Gary -- You can create a new microtheory when describing a narrower field or are using multiple existing contexts, or when presenting information about a specific event or other individual.     (1F74)

Nicola Guarino: A couple of problems I find in the current taxonomic structure of 1. A governmentOffice is both a place and an organization     (1F76)

Christopher Spottiswoode: What a privilege that was, to be able to listen in on that conversation, with all that experience! Thank you all.     (1F77)

Doug Foxvog: @Gary -- adapt an existing context when providing more info @ same level     (1F78)

Stefano Bortoli: thanks     (1F79)

Stefano Bortoli: bye     (1F80)

Peter P. Yim: Great session ... thank you Guha, Dan and everyone all for coming!     (1F81)

Guha: Thank you everyone     (1F82)

Peter P. Yim: -- session ended : 11:00am PST --     (1F84)

-- end of chat session --     (1F85)

Audio Recording of this Session     (1G)

  • suggestion: its best that you listen to the session while having the presentation opened in front of you. You'll be prompted to advance slides by the speaker.     (1G5)
  • Take a look, also, at the rich body of knowledge that this community has built together, over the years, by going through the archives of noteworthy past Ontolog events. (References on how to subscribe to our podcast can also be found there.)     (1G6)

For the record ...     (1G7)

Attendees     (1I)

