RDFa is a new technique for embedding metadata into any XML document by using a small number of attributes. Its primary use is in XHTML documents, and allows metadata to be added in such a way that an ordinary home page can provide metadata such that it can serve as a FoaF file, an RSS feed, even a list of items for sale.
This paper aims to provide an introduction to RDFA, and will show how it can make the publication of metadata as easy as publishing any other type of information.
Although most people have heard of RDF, it is often associated with its rather obtuse language--RDF/XML. RDF is nothing more than a very general way of describing data, but as with many things that are very general, this gives it both power and the potential to confuse.
The basic idea of RDF is to reduce all collections of data to 'nuggets' of information called triples. A triple is nothing more than 'some item' having a property of 'some value'. It might be:
Mark has an address of London, UK
or:
XTech 2006 has a venue of Amsterdam
By breaking things down to such fundamental building blocks, RDF can be useful for anything from knowledge management to database definitions, to marking up metadata about web pages.
RDF has evolved over the years, and a number of incredibly powerful layers have been built upon it. For example, reasoning software can make use of RDF statements to work out other statements. To illustrate, it may be the case that some system knows the following facts:
XTech 2006 has a venue of Amsterdam XTech 2006 has a start date of May 16th XTech 2006 has an end date of May 19th
If the system also discovers that:
XTech has a speaker of Mark
then it could deduce that on May 18th:
Mark has a location of Amsterdam
The major problem that people have been trying to tackle is that although RDF itself is fairly simple, the language that is usually used to express it is not. RDF is usually 'carried' via RDF/XML which is renowned for its difficulty.
The reason this is such an important issue is that a great deal of the web's metadata resides in HTML pages sitting on web-sites. Company addresses, the weather in Tokyo, the price of a second-hand car...every day millions of pieces of metadata are placed onto the internet which are not usable in software since they are not formatted in any standardised way. Without some kind of mechanism for extracting this information the dream of the Semantic Web will remain just that--a dream.
What's needed is not to throw away RDF, but to find an easier way of encoding it that allows the output of HTML authors to be placed at the centre of the Semantic Web.
The growth of interest in so-called microformats shows the there is a real need for, and interest in, such a solution. The technique used is to agree that certain patterns of HTML usage can be 'codified' to represent some agreed upon metadata. For example, the following HTML mark-up:
<div class="vcard">
<a class="url fn" href="http://tantek.com/">
Tantek Çelik
</a>
<div class="org">Technorati</div>
</div>
leverages the use of CSS classes to reuse pieces of the mark-up as metadata, and so represent a vCard. This approach is the same as that used by GRDDL, which uses XSLT to take an XHTML document and extract from it pieces of metadata.
But the problem with both solutions is that they don't scale, and so don't allow the mark-up data to become part of the Semantic Web. In both cases they require already existing metadata formats (vCard, iCalendar, and so on) to have a 'partner' or 'mirror' definition created--usually called the same name but with an 'h' in front--which guides document production.
Despite these problems, the goal of initiatives like microformats is important, since it sets out the possibility of carrying metadata in an ordinary HTML document. But ultimately, without addressing the problem of scale, we are not much closer to our goal of building a Semantic Web that parallels the visible web we've used for years.
The approach taken by RDFa is that ultimately any RDF structure should be representable. This means that instead of having to 'codify' each format to describe how it must be marked up, we simply provide a set of rules that explain how anyRDF can be marked up, and than anyRDF 'language' can be used.
This means that a library, for example, can still make use of the complex taxonomies and schemas that it relies on, whilst a web author can mark-up their home page using something like Friend-of-a-Friend (FoaF).
Jane has lots of friends, family and work colleagues with which she would like to stay in touch during her busy schedule. She would like to set up a home-page for herself, where people who know her can find useful contact information, such as her phone number or work email.
Jane's first stop is to create a page that contains information about her that can be read by anyone using a web browser. She begins with some details for people who might be trying to contact her at work:
<html>
<head>
<title>Jane Doe's Home Page</title>
</head>
<body>
<p>
Hello. This is Jane Doe's home page.
<h2>Work</h2>
If you want to contact me at work, you can
either <a href="mailto:jane.doe@example.org">email
me</a>, or call +1 777 888 9999.
</p>
</body>
</html>
Jane can now pass on the address of her home-page to her friends, which is http://jo-lambda.example.org/.
One of Jane's friends, John, tells Jane that the address book software he uses can be automatically kept up-to-date with Jane's details. All Jane needs to do is to add some tags to her home page to help the system understand her data. The tags that Terri's address book understands
come from a special list--often called a vocabulary--specifically for describing relationships between people. The particular vocabulary she is going to use is called 'Friend-of-a-friend', or FoaF.
The first thing that Jane needs to do is to add an identifier to the top of her home-page that will make the FoaF vocabulary available:
<html xmlns:foaf="http://xmlns.com/foaf/0.1/">
Jane then looks through the FoaF vocabulary, and sees that the pieces of information that she has in her page--name, phone number and email address--all have special names within FoaF. She therefore adds those names to her document, using the following approach:
href attribute of an a element, then the rel attribute can be added to the element, and its value is set to contain the name of the property she wants to add;property.Let's look at each of those rules.
Jane has provided a link in her home-page to her email address, which is jane.doe@example.org:
.
.
.
If you want to contact me at work, you can
either <a href="mailto:jane.doe@example.org">email
.
.
.
However, to ensure that John's address book software understands this, Jane can use the FoaF mailbox property:
.
.
.
If you want to contact me at work, you can
either <a rel="foaf:mbox" href="mailto:jane.doe@example.org">email
.
.
.
Note that using QNames to describe the property means it is clear and unambiguous--no matter where this appears, in whatever document, it will mean the FoaF mbox property.
In addition to her email address, Jane also wants to add her name and phone number. Currently the values that she would like to use for these properties are not separated from the other text items so, as per rule 2, Jo adds some simple wrapper elements:
<p>
Hello. This is <span>Jane Doe</span>'s home page.
<h2>Work</h2>
If you want to contact me at work, you can either
<a
rel="foaf:mbox"
href="mailto:jane.doe@example.org"
>
email me
</a>
, or call <span>+1 777 888 9999</span>.
</p>
Now that the text is inside span elements it is easy to add the FoaF properties for name and phone number, using the RDFa
attribute property:
<p>
Hello. This is
<span property="foaf:name">Jane Doe</span>'s home page.
<h2>Work</h2>
If you want to contact me at work, you can either
<a
rel="foaf:mbox"
href="mailto:jo.lambda@example.org"
>
email me
</a>
, or call
<span property="foaf:phone">+1 777 888 9999</span>.
</p>
The completed document looks like this:
<html>
<head>
<title>Jane Doe's Home Page</title>
</head>
<body>
<p>
Hello. This is
<span property="foaf:name">Jane Doe</span>'s home page.
<h2>Work</h2>
If you want to contact me at work, you can either
<a
rel="foaf:mbox"
href="mailto:jo.lambda@example.org"
>
email me
</a>
, or call
<span property="foaf:phone">+1 777 888 9999</span>.
</p>
</body>
</html>
Now all John needs to do is to provide the internet address for Jane's home-page to his contact software, and it will be able to extract the following information about Jane:
foaf:name = "Jo Lambda" foaf:mbox = "mailto:jo.lambda@example.org" foaf:phone = "+1 777 888 9999" foaf:homepage = "http://jo-lambda.example.org/"
More formally, the markup Jane added to her XHTML defines a set of RDF triples. Each triple effectively represents one property of her data.
The final mark-up shows how Jane's document can double as both an ordinary home-page and a collection of FoaF data. This has a twofold advantage: