By Jon: First published in Online Currents 2003 – 18(7):25
Another day, another set of initials. This time the initials stand for Resource Description Framework. This is closely associated with XML (eXtensible Markup Language) which in turn is a subset of SGML (Standard Generalised Markup Language). But what does it all mean?
RDF, like XML, was developed under the auspices of the World Wide Web Consortium. It is designed to provide for the expression of semantic information –information about things. At the core of RDF is the notion of a resource description. A resource is something: it could be a document, a book, a company or a person, or any other object or concept of interest. A description is a set of information whichrepresents the resource. The information is obviously selected to be of value to users or searchers; thus your resource description would probably include information about your position title and phone number but not whether you can waggle your ears.
Information is given in a resource description via defined properties. Only certain data count as properties, and these are specified through a list of valid property-types, called a schema. Property-types should be logically and practically appropriate to the type of resource. Thus ‘weight’ would be a valid property type for describing a vehicle, and ‘CEO’ would be a valid property type for describing a company, but not vice versa. Schemas are stored in namespaces – see below for these.
The properties in a resource description are assigned values, of which the simplest are just strings of text – ‘Harry Smith’, ‘1200 kg’ and so on. The value – property-type – resource triad is typically represented in plain English as ‘value is the property-typeof resource’ or ‘the property-type of the resource is value’; e.g. ‘Harry Smith is the CEO of Snibbo Enterprises’; ‘The weight of the Toyota Camry is 1200 kg’. These triples are assumed to be logically independent assertions which are mutually compatible: RDF as such does not attempt to check on the logical consistency of what is asserted, although this is a possibility within the system.
Values can be links to resource descriptions: thus ‘Harry Smith’ may have its own collection of values – 38 years old, male, born in Toronto – which may in turn be called on by the person who wants to find out about Snibbo Enterprises. In plain English we can think of these as adjectives or subordinate clauses: ‘Male, Toronto-born Harry Smith, 38, is the CEO of Snibbo Enterprises’. In RDF this involves a cross-referential structure, in this case from a collection of information about companies to a collection of information about people. This information can also be shown diagrammatically as a network of connections between nodes: a sample is shown below:
An RDF graph (from a paper by Eric Miller)
This lookup system hinges on individuals having unique identifiers: in other words, we need a way to distinguish 38-year-old male, Toronto-born Harry Smith the CEO from 54-year-old Mogadishu-born Harry Smith the filing clerk, and this is best done by giving each individual a unique ID or serial number within that particular data collection. If we give Harry the CEO the serial number 00001 and Harry the clerk the serial number 00234, then we can write the connection between Snibbo and Harry Smith the CEO as: ‘the CEO of Snibbo Enterprises is the person represented by record 00001 in the Employee database’.
Once individual resource descriptors are assigned unique identifiers it becomes possible for a person (or a search engine) to thread between different data collections constructing meaningful connections: thus a search that started with the song title ‘Valotte’ might retrieve the information that the songwriter’s father was shot outside the building in New York that featured in a film (Rosemary’s Baby) starring Mia Farrow. This is the concept underlying the notion of a ‘semantic web’, where the connections that we make linguistically can be codified into a form that allows computer searching and analysis.
This structure is not unique to RDF: it is found in any relational database system. Where RDF goes beyond this is in providing a universal syntax for representing this information and in providing ways to verify this syntax.
RDF syntax is a simplified form of XML in which no identifier (resource description) has more than one property, and all identifiers in a collection have the same property set. As in XML, each identifier must be declared before use, and in RDF this is done by providing a URI (Unique Resource Identifier) which identifies that resource. URIs are obviously related to the URLs used to access Web sites, but RDF does not put any constraints on them other than that they are not duplicated: thus a URI could in principle refer to a library shelf list of physical books, a company’s employee database, or any other unique way of locating a resource. If the resource of interest happens to be a Web page, the URI is identical with its URL.
RDF information is tagged with its own prefix and uses the ‘Description’ property-type to identify the URI, as in the examples that follow:
The property-types (if any) for that resource and their values are shown on indented lines underneath the resource, as follows:
Policy and Procedures
Property-types cannot be created from scratch; they need to be drawn from an existing collection. These collections are referred to as vocabularies or namespaces, and are a component of standard XML. They can be local collections stored on site, or an external, globally available collection of property-types like the Dublin Core (DC) collection of metadata. They can also be ‘mixed and matched’ from several different collections, both local and external.
A hospital librarian cataloguing books, for instance, could use the Dublin Core metadata set for basic information like title, author and publisher, and a local namespace for local information like cost code, geographical location and the department by which the book was requested.
Where all, or nearly all, of the property-types come from the same namespace this can be defined as the default, and a prefix for these property-types can be omitted. In all other cases, property-types need to have a prefix identifying the namespace from which they are drawn. This prefix is defined at the top of the document following the URI declaration for the namespace, as shown below:
Policy and Procedures
As in XML, each resource and property in RDF has to have a closing tag.
In this example three namespaces are declared: the standard RDF namespace which contains the specifications for the framework; the DC (Dublin Core) namespace, and a local namespace specific to Snibbo Enterprises. The contents of namespaces can themselves be written in RDF, making them computer-readable and allowing computers to collect and analyse ‘semantic’ information about resources.
By providing access to a namespace RDF allows for documents to be checked for correctness: are all the property-types present and correctly-spelt? Are there any additional property-types which are not supported?
Much of the work involved in generalising RDF will require setting up universally recognised URIs for objects of wide general interest – e.g. authors, vehicles, copyright holders, heads of state and recipes – such that these ‘ontologies’ can be referenced from a wide variety of different documents from different sources. Ultimately, for instance, a bibliographical database might take the form of RDF statements equivalent to ‘Person ABC:199394 wrote document DEF:338728’, where the resource descriptions of the individuals concerned might be stored in logically and geographically separate locations. A stated goal of RDF and the ‘semantic web’ is to have one and only one description globally available for each resource; although what this might mean in practice has yet to be worked out.
Because RDF is a subset of XML, standard XML verification systems should work for RDF resource descriptions. This includes the XML validation add-ins available for Microsoft Internet Explorer version 6 (see msdn.microsoft.com/library/default.asp?url=/downloads/list/xmlgeneral.asp) as well as on-line XML verification sites.
However, a range of specifically RDF-related software is beginning to appear. Most of these can be found through the O’Reilly site at www.xml.com, although some of the links there appear to be out of action at the current date (July 2003). Formal specifications for RDF can be found at http://www.w3.org/RDF.
- Profium (www.profium.com) provides an RTF parser which converts standard XML to RDF ‘triples’. An on-line demonstration is available.
- IsaViz (www.w3.org/2001/11/IsaViz) is a free visual authoring tool for RDF, currently in alpha development.
- A validating RDF parser is available through the Forth Institute of Computer Science at http://188.8.131.52:9090/RDF/VRP/index.html.
More detailed introductions to RDF can also be found on the Web:
- Eric Miller has written An Introduction to the Resource Description Framework in D-Lib online magazine for May 1998: seewww.dlib.org/dlib/may98/miller/05miller.html.
- An article of similar vintage by Tim Bray, What is RDF? has been updated and can be found at www.xml.com/pub/a/2001/01/24/rdf.html.
And for an interesting (and largely negative) view of the ‘semantic web’ pretensions of RDF, see The Social Meaning of RDF by Kendall Grant Clark atwww.xml.com/pub/a/2003/03/05/social.html, and his follow-up article ‘Social Meaning and the Cult of Tim’ (i.e. Berners-Lee) atwww.xml.com/pub/a/2003/07/23/deviant.html.