Ontolgies


On the 15th December 2003 the W3C announced the advancement of the OWL Web Ontology Language (OWL) to Proposed Recommendation, even before this announcement was made I was planning on using this language in the development of the ontology which will be the focus of this work, I was initially planning on producing an exhaustive documentation of the development of the language but after even cursory examination of the relevant literature the incestuous nature of semantic web development became clear. This has necessitated a slight adjustment and I will produce a "potted history" instead which will show how the ideas associated with the semantic web became refined. These recent developments can hardly be surprising after consideration of the increased interest in such things; it seems only natural that the best elements of one language are assimilated by others. Briefly though, OWL incorporates lessons learned from the design and application of DAML+OIL which was developed jointly by the US Defence Advanced Research Projects Agency (DARPA) who were working on the DARPA Agent Markup Language (DAML), and the European Union's Information Society Technologies Program who were working on Ontology Interchange Language (OIL). DAML+OIL combines both languages and the best aspects of SHOE to create a new language for ontologies (Hendler, 2000) while using vocabulary borrowed from RDF.

OWL is seen as the top of the growing stack of W3C standards associated with the semantic web. This stack has XML as its base, with XML Schemas, RDF and RDF Schemas growing from that base. McGuinness and Harmelen (2003) show the following to describe the state of affairs at present:

To some extent this project will not make use of some of the facilities available in OWL and it might be posited that to some extent equal functionality might be better incorporated using existing, tried and tested, technologies but the development of the ontology is worthwhile for exactly those reasons listed above by Noy and McGuinness. Indeed, much of the technologies associated with the Semantic web as a concept are still somewhat ethereal in that there is a dearth of software "agents" which might be able to use such data, but that is not to say that such development is not worthwhile. There is an illustration associated with the Semantic web (figure 3.), known as the (in)famous layer cake, which has somewhat confused routes as it has been copied frequently by many different authors, but it serves to show the technologies required for a semantic web, the top 2 or 3 layers of the diagram are still somewhat esoteric with researchers not decided upon how they might be implemented.

This issue of the development of the semantic web is fraught with all manner of, to my mind, philosophical questions and I hopes to touch on these to a greater degree in the conclusion but it might warrant considering how technologies are developed in reality rather than in the ideals of others. Figure 3 seems to show a natural progression of technologies but the actual driving force behind these technologies is still something which might not be all that useful - despite the protestations of Tim Berners-Lee. It does however provide researchers with a framework within which to work and base their efforts by providing something of a roadmap; with the final ratification of OWL on the horizon it might be appropriate to acknowledge that we are halfway to the semantic web. This situation minds me of the so called "dodo bird verdict" from my own professional sphere of activity - simply put, this phrase was coined by Luborsky et al in 1975 when they researched the efficacy of different types of psychotherapy and discovered there was no difference between in terms of successful outcomes for the user. Perhaps this is labouring the point somewhat but it is certainly something which I plan to return to later in this dissertation because this issue of whether things come to pass by design or by accident is interesting, particularly in the context of the authors own feelings regarding the "doing" of computer science.

Let us look a little further at the development of the technologies associated with the Semantic web, and in turn OWL, by looking at the first layer of the cake.


XML Schema


We have already noted that in order for XML to be valid it needs to conform to a DTD and to some extent DTDs provide some semantic content in that they restrict the content of XML documents to those elements which are agreed prior to the documents creation, XML Schema (XMLS) merely reformulated the information contained in a DTD using XML as DTD used a different language. At this point it might be useful to examine an XML document with an internal DTD:

      
          <?xml version="1.0"?>
          <!DOCTYPE note [
            <!ELEMENT note (to+,from,heading,body)>
            <!ELEMENT to      (#PCDATA)>
            <!ELEMENT from    (#PCDATA)>
            <!ELEMENT heading (#PCDATA)>
            <!ELEMENT body    (#PCDATA)>
          ]>
          <note>
            <to>Tove</to>
            <from>Jani</from>
            <heading>Reminder</heading>
            <body>Don't forget me this weekend</body>
          </note>
      
    

This documents !DOCTYPE defines that the document is of the type note, the !ELEMENTs associated with note are to, from, heading and body and that all documents purporting to belong to that type of document known as note (should the specific DTD be made external) need to have exactly one of each type of !ELEMENT, each of the to, from, heading and body !ELEMENTs is of the type #PCDATA, meaning parsed character data. This XML document contains a description of its own format within it but the DTD can be external, the following two files illustrate this, the first would be known as note.dtd, the second could have an arbitrary name (As, indeed, could the original XML file above):

      
          <!ELEMENT note (to+,from,heading,body)>
          <!ELEMENT to      (#PCDATA)>
          <!ELEMENT from    (#PCDATA)>
          <!ELEMENT heading (#PCDATA)>
          <!ELEMENT body    (#PCDATA)>

          <?xml version="1.0"?>
          <!DOCTYPE note SYSTEM "note.dtd">
          <note>
            <to>Tove</to>
            <from>Jani</from>
            <heading>Reminder</heading>
            <body>Don't forget me this weekend</body>
          </note>
      
    

There are many different types of data which can be used in a valid XML document but the actual language used in the creation of a DTD isn't XML and the rules associated with creating a well-formed DTD can be baffling in that it is possible to restrict the number of times an !ELEMENT can occur as well as defining other properties associated with a given XML document. DTDs were a compromise between the old and the new, SGML uses DTDs and it was thought that developers of SGML would have an easier time if they could keep certain aspects of their preferred language while making the transition to the new, it did however mean that developers new to XML were forced to learn two languages. The DTD described above would be written in XMLS as:

      
          <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
            <xsd:annotation>
              <xsd:documentation xml:lang="en">
                Note schema for dissertation.
              </xsd:documentation>
            </xsd:annotation>
          <xsd:element name="note" type="note"/>
          <xsd:complexType name="note">
            <xsd:sequence>
              <xsd:element name="to" type="xsd:string" minOccurs="1"
                maxOccurs="unbounded"/>
              <xsd:element name="from" type="xsd:string"/>
              <xsd:element name="heading" type="xsd:string"/>
              <xsd:element name="body" type="xsd:string"/>
            <xsd:sequence>
          </xsd:complexType>
      
    

This is somewhat simplistic as it might be proper to formulate the XMLS in different ways and while it is possible to restrict elements to one only, zero or more, or one or more (+) elements in a DTD, XMLS allows the author to dictate the minimum and maximum number of types an element may occur in a given document (Both examples show that the note can be addressed to one or more recipients). In DTDs it was possible to offer a number of possible permutations to the elements in a document with XMLS this process is much simpler.

We can see then that both DTDs and XMLS fulfil pretty much the same role in that they dictate and restrict the type of data that can be enclosed in a given XML document (Campbell, 2001). It has been pointed out that all valid XML documents must conform to a DTD as that is a requirement of the XML specification, while there is argument about the relative merits of both XMLS and DTDs it might be wise to use both techniques in order that they complement each other in terms of the constraints that they offer (Grosso, 2001).


RDF


Resource description framework (RDF) is recognised as a language for creating ontologies but it is less rich than OWL. It is able to describe the metadata associated with a given XML document. Metadata as a concept seems quite problematic but when we use a phonebook or a library index we are using metadata, that is: data about data.

RDF as a technology relies upon Uniform Resource Identifiers (URI) which, when compared to the Uniform Resource Locator (URL), exist in a similar relationship to HTML and XML. That is to say that URL and HTML came first but are now considered subsets of URI and XML respectively. Where an URL is limited to referring and pointing to other locations on the internet an URI is not. Basically URIs "are short strings that identify resources in the web: documents, images, downloadable files, services, electronic mailboxes, and other resources" (Connolly, 2003), but they are not limited to the web. Connolly is really quite interesting in his discussion of the URI in that he hints at the neuro-physiology of human-kind and its ability to make use of information about spaces, bringing to mind the tool using abilities of human users, something which is still far off for software agents but we take for granted, for an expanded discussion on this facet of the difference between man and machine when it comes to our respective abilities in terms of all types of processing please see Grand 2004. An URI is merely a pointer to a thing within a space, be it actual or virtual. Berners-Lee et al (1998) acknowledges the somewhat confused nature of the relationship between the URI and the URL, but go on the clarify Connolly by noting that an URI does not have to point to an electronically retrievable resource, "human beings, corporations, and bound books in a library can also be considered resources" (Berners-Lee et al, 1998, p2). An URI uses the 95,221+ Unicode character set to allow for the multiplicity of languages in which information is marked up throughout the globe (Malik, 2003).

RDFs have at their core a type of logic in that RDF statements have a subject, a predicate and an object, as Champin (2001) puts it: "We will say that <subject> has a property <predicate> valued by <object>". Perhaps the most interesting approach to the explanation of RDFs that I have come across is from XULPlanet (2003) when it notes that RDFs are a model for describing graphs of information. This concept is common when looking at the description of RDF as a technology, and at this point I'd ask the reader to examine Figure 4.

In Figure 4 we graphically describe details associated with the XML note that we have been using. It says that Tove has a friend called Jani and that both Tove and Jani are planning on going to a hotel. In RDF we could document this map by writing the following:

      
          <Tove,hasFriend,Jani>
          <Tove,goingTo,Hotel>
          <Jani,goingTo,Hotel>
      
    

However, in RDF each element is written in the form of an URI but usually with fragment identifiers (that is to say a further string appended to the URI with a hash "#") to point to a specific point in the resource. So:

      
          <Description about="some.uri/person/Tove">
            <hasFriend resource="some.uri/person/Jani>
          </Description>
          <Description about="some.uri/person/Tove">
            <goingTo>
              http://www.huddersfieldhotels.com/george.html
            </gointTo>
          </Description>
          <Description about="some.uri/person/Jani">
            <goingTo>
              http://www.huddersfieldhotels.com/george.html
            </gointTo>
          </Description>
      
    

At this point is might be interesting to look at WordNet, accessible at http://xmlns.com/ which goes some way to introducing a "a space for XML names" and thus comes close to addressing the issues brought up by Conen and Klapsing (2000) in their paper A Logical Interpretation of RDF. Amongst others, they point out that there is no formally defined semantics associated with plain RDF. That is to say that while RDF gives a formalism for metadata annotation, and a way to write it down in XML it does not give and special meaning to vocabulary such as hasFriend or goingTo so interpretation is an arbitrary binary relation. This situation is addressed by RDF Schema which allows the definition of vocabulary terms and the relations between these terms.


RDF Schema


"RDF's vocabulary description language, RDF Schema, is a semantic extension of RDF. It provides mechanisms for describing groups of related resources and the relationships between these resources". (Brickley & Guha (ed.), 2003). This quote from the W3C goes some way to describe RDFS by pointing out that it gives special meaning to terms in regular RDF such as hasFriend or goingTo which could be interpreted arbitrarily without the use of a schema. This means that RDFS allows the insertion of meaning when using certain RDF predicates and resources which specify how such terms should be interpreted. An example of the need for RDFS might be the examination of train times; one train operator might store data about its schedule in RDF thus:

      
          <?xml version="1.0"?>
            <rdf:RDF
              xmlns:rdf="http://www.w3.org/1999/02/22-rdfsyntax-ns#"
              xmlns:f="http://www.example.org/train">
              <rdf:Description
                rdf:about="http://www.example.org/train#VS018">
                <f:number>VSOI8<f:number>
                <f:origin>Cambridge</f:origin>
                <f:destination>Kings Cross</f:destination>
                <f:departure>20:45 05/05/2003</f:departure>
                <f:arrival>21:35 05/05/2003</f:arrival>
              </rdf:Description>
            </rdf:RDF>
      
    

But in order for an agent, whether human or machine, to understand RDF data from another operator they might best be served if the documents conform to one specific RDFS.

As the reader might be able to observe the above code fragment has a considerable header in the form of a number of statements beginning with "xmlns:". This section of the code is for the definition of external namespaces or to specify the origin of the various vocabularies used in the document, if you prefer. This allows for the reuse of previously created semantic vocabularies such as the Dublin Core (more about which will be discussed later).

One issue for when it came to examining and explaining RDF and RDFS was that of its relevance to a human examining a document, after reading Hillmann (2003) this became less of an issue as it became obvious that metadata in the form of RDF or the more advanced RDFS could be either embedded in the data-source itself or separated from it - a little like DTDs in that respect. XML and XML Schema are all viewable by humans and can be formatted with technologies such as Cascading Style Sheets (CSS) to look not dissimilar to standard HTML, this simplifies not only their perusal but also their creation, at least in my mind, but this process doesn't seem to be quite so clear cut when examining RDF and RDFS. Through experimentation though - more specifically: by embedding metadata within a standard XHTML (That is a document written in markup which conforms to the W3C's XHTML subset of XML.) document - it was found that in the case of Microsoft's Internet Explorer and The Mozilla Foundation's Mozilla web-browsers, the metadata wasn't parsed in a human readable manner. This goes some way to making claims by people such as Herman (2003) acceptable when he says of the semantic web: "It extends the current Web (and does not replace it)".

For instance, should this document be made available online in HTML format its metadata could be included in the head section thus:

      
          <link rel="schema.DC" href="http://purl.org/DC/elements/1.0/">
          <meta name="DC.Title" content = "Ontology of an NHS Trust">
          <meta name="DC.Creator" content = "Dominic Richard Myers">
          <meta name="DC.Type" content = "Dissertation">
          <meta name="DC.Date" content = "2004">
          <meta name="DC.Format" content = "text/html">
          <meta name="DC.Language" content = "en">
          <meta name="DC.Description"
            content = "An online version of my dissertation just so that
            I can keep it up to date and review what it is all about,
            not to mention allowing me to try out the stuff I'm learning
            about.">
      
    

In an XML document the metadata might be encoded in RDF thus:

      
          <?xml version="1.0"?>
          <rdf:RDF
            xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rdf:Description
              about="http://camshag.co.uk/svg/dissertation/
                dissertation.html">
              <dc:title>Ontology of an NHS Trust</dc:title>
              <dc:creator>Dominic Richard Myers</dc:creator>
              <dc:type>Dissertation</dc:type>
              <dc:date>2004</dc:date>
              <dc:format>text/html</de:format>
              <dc:language>en-UK</dc:language>
              <dc:description>An online version of my dissertation
                just so that I can keep it up to date and review what it
                is all about, not to mention allowing me to try out the
                stuff I'm learning about.</dc:description>
            </rdf:Description>
          </rdf:RDF>
      
    

This RDF formulation of the metadata could be separate from the document itself and could even be part of a document which describes all of the dissertations submissions for a particular year, for instance. In the above example dc represents the Dublin Core (As mentioned above.) which is used to capture metadata about a document. There are 15 main elements associated with the Dublin Core (Please refer to Figure 5.) which are primarily used for describing a document, its author, title and other details.


OWL


Aleman-Meza (2003), along with Gil & Ratnaker (2003), explain that RDFS does not offer enough constraints for a true semantic web, this necessitated the creation of richer languages, as we have already noted. Malik (2003) describes OWL thus: "OWL is the XML Schema for RDF; OWL allows the definition of new vocabularies and ontologies that are written in RDF" (p28). We are left then in a situation where it is confused as to whether or not OWL is a replacement for, or a refinement of, RDFS. When examining an OWL document the structure shows remarkable similarities to an RDFS document so it would seem that it is a refinement of RDFS in this context, this position is certainly born out by Khan (2003) when he calls OWL "nothing but enhanced RDF" (p12).

There are three different flavours to OWL of increasing expressive power (Horrocks & Patel-Schneider, 2003); OWL Lite, OWL DL and OWL Full. They offer differing levels of expressiveness, decidability and computational completeness with OWL DL being perhaps the most useful in that it is so named because of its correspondence to Description Logic (DL), at this point is would be useful to quote from Lambrix (no date):

"Description logics are knowledge representation languages tailored for expressing knowledge about concepts and concept hierarchies. They are usually given a Tarski style declarative semantics, which allows them to be seen as sub-languages of predicate logic. They are considered an important formalism unifying and giving a logical basis to the well known traditions of frame-based systems, semantic networks and KL-ONE-like languages, object-oriented representations, semantic data models, and type systems. The basic building blocks are concepts, roles and individuals. Concepts describe the common properties of a collection of individuals and can be considered as unary predicates which are interpreted as sets of objects. Roles are interpreted as binary relations between objects. Each description logic defines also a number of language constructs (such as intersection, union, role quantification, etc.) that can be used to define new concepts and roles. The main reasoning tasks are classification and satisfiability, subsumption and instance checking. Subsumption represents the is-a relation. Classification is the computation of a concept hierarchy based on subsumption. A whole family of knowledge representation systems have been built using these languages and for most of them complexity results for the main reasoning tasks are known. Description logic systems have been used for building a variety of applications including conceptual modeling, information integration, query mechanisms, view maintenance, software management systems, planning systems, configuration systems, and natural language understanding."

That is to say that individuals are instances of concepts and that roles represent the relationships between objects.

An OWL document has as its header a list of namespaces declarations which specify the origin of the various vocabularies used. For instance, when we looked at RDFS the Dublin Core was referenced as a source for vocabularies associated with document metadata. The example used by Smith et al (2003) is such:

      
          <rdf:RDF 
            xmlns     ="http://www.w3.org/2001/sw/WebOnt/guide-src/wine#" 
            xmlns:vin ="http://www.w3.org/2001/sw/WebOnt/guide-src/wine#"       
            xmlns:food="http://www.w3.org/2001/sw/WebOnt/guide-src/food#"    
            xmlns:owl ="http://www.w3.org/2002/07/owl#"
            xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
            xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#">
      
    

The first namespace is blank and this signifies that the default namespace, meaning that unprefixed qualified names would refer to the current ontology. Because the first two are associated with the same URI qualified names prefixed with vin also refer to the current ontology. The third refers to another ontology but the forth element of owl means that elements qualifed by owl should be understood as being drawn from the namespace http://www.w3.org/2002/07/owl#. It might be overly dramatic to say that OWLs debt to preceeding technologies is acknowledged in the final three namespace declarations but this is almost the case in that OWL makes use of constructs from all three datatypes.

Following the namespace declaration is the ontology header where critical housekeeping information such as version control and comment tags might be included. The following is again taken from the wine and food ontology:

      
          <owl:Ontology rdf:about="">
            <rdfs:comment>An example OWL ontology</rdfs:comment>
            <owl:priorVersion>
              <owl:Ontology rdf:about="http://www.w3.org/TR/2003/
                WD-owl-guide-20030331/wine"/>
            </owl:priorVersion>
            <owl:imports rdf:resource="http://www.w3.org/TR/2003/
              CR-owl-guide-20030818/food"/>
            <rdfs:comment>Derived from the DAML Wine ontology at 
              http://ontolingua.stanford.edu/doc/chimaera/ontologies/
              wines.daml Substantially changed, in particular the
              Region based relations.
            </rdfs:comment>
            <rdfs:label>Wine Ontology</rdfs:label>
          </owl:Ontology>
      
    

The empty quatation marks signify that the current document describes an ontology for the present document.

The main elements of an OWL ontology as opposed to prior attempts at making ontology languages for the web is the introduction of classes, these in turn can be divided into subclasses such that it is possible to define a class and then define further subclasses. The following examples are associated with air flights and are from Malik (2003):

      
          <owl:Class rdf:ID="Flight"/>

          <owl:Class rdf:ID="InternationalFlight">
            <rdfs:subClassOf rdf:resource="#Flight" />
          </owl:Class>
      
    

This shows that the class Flight has a subclass called InternationalFlight which by definition inherits some or all of its properties. These Properties are used to make assertions about the classes. All OWL classes are sublassess of the OWL class Thing, but the main classes defined in a given OWL ontology represent the taxonomic tree roots of a given domain. Perhaps it might be more formal to make these top-level classes a type of subClassOf Thing but this isn't required by the specification.

The rdfs:subClsssOf relation is transitive in that it not only means that if class X is a subclass of Y then all instances of X are also instances of Y but also that If X is a subclass of Y and Y a subclass of Z then X is a subclass of Z.

We can introduce specific examples or instances of a class by saying they are of a type of class thus:

      
          <Region rdf:ID="CentralCoastRegion"/>
      
    

This is exactly the same as saying:

      
          <owl:Thing rdf:ID="CentralCoastRegion"/>

          <owl:Thing rdf:about="#CentralCoastRegion">
            <rdf:type rdf:resource="#Region"/>
          </owl:Thing>
      
    

The differences between the phrases associated with any examination of differing ontologies can be confusing, for example, with DL using "concept" and OWL DL using the term "class". Noy and McGuinness (2000 p3) show the differences as:

Phrases used in this Dissertation Alternative phrases
Class Concept
Properties Roles, Slots
Role Restrictions Facets