Fachbereich Wirtschaftswissenschaft - Freie Universität Berlin
Prof. Dr. Uwe H. Suhl - Institut für Produktion, Wirtschaftsinformatik und OR

   
Semantic Web - Google Base
Upload Agent

The Project

As an experiment, we are currently working on a Web crawler which collects RDF data from the public Web and uploads this data into Google Base.

Motivation

Google has started a new service called Google Base (see Announcement). Google Base is a public database into which everybody can upload any kind of structured information. Uploaded information can be searched using a web-interface. We think that Google Base might turn out to be an important step in the development of the Web from a medium for publishing unstructured text into a medium for publishing structured information. But Google Base also raises some questions concerning the Semantic Web, an effort aiming at extending the current Web with structured information. Semantic Web data can be accessed and used by everybody. By publishing information only on Google Base and not on the Web, you kind of donate your information to a single private company, having the role of a gatekeeper that can decide what is done with your information.

Thus, we think it is preferable to publish structured information directly on the Web using Semantic Web technologies and to have a crawler which collects this public information and pushes it into Google Base. This setup combines the advantages of both architectures: Everybody can access your information without any gatekeeper and information can still be searched easily using the Google search interface.

Current Status

We are currently experimenting with uploading FOAF profiles into Google Base. We search for profiles using the FOAF bulletin board as a starting point, and crawl rdfs:seeAlso links. We crawl only "hand-crafted" profiles and ignore the large social network sites like LiveJournal, tribe.net and TypePad. We currently don't perform any "smushing" on the profiles, so duplicates are possible.

  • 11/28/2005: Uploaded first set of FOAF profiles, which worked fine.
  • 11/29/2005: Uploaded larger set consisting of 400 FOAF profiles. Got a message from Google Base that we have "exceeded the activity limit for this account on the beta version of Google Base". The items are displayed correctly in the dashboard, but are published on Google Base without item titles, which is very disturbing when searching. Strange looks like a bug in their system.
  • 11/30/2005: Published a second smaller set of profiles (100). Now processing takes an awfull long time. Looks like the system has some scalability problems or they are working on their servers.
  • 12/1/2005: All our items are having the status "Not available" in the dashboard today. The items from the 11/30 upload haven't been published. But the items from the 11/29 upload are still public. Strange.
  • 12/3/2005: Now all the items that we uploaded are availiable and render correctly :-)

 

Feedback

We are very interested in your opinion about this experiment. Please send feedback to Chris Bizer and Richard Cyganiak and cc the Semantic Web mailing list if your comment is of general interest.

Opt-Out

If you don't want us to upload your RDF files into Google Base any more, please send an email to Chris Bizer and we will remove your file.

 

 

 

Freie Universität Berlin - Fachbereich Wirtschaftswissenschaft - Institut für Produktion, Wirtschaftsinformatik und Operations Research
Lehrstuhl für Wirtschaftsinformatik: Chris Bizer
Letzte Aktualisierung: 05.12.2005
Administrator