A Brief Guide to the Canonical Text Service

What is CTS?

Canonical Text Services identify and retrieve passages of text cited by canonical reference.

Citations are expressed as [CTS URNs|guide:ctsurns]. Text passages are structured in XML that can be validated against some schema or DTD.

Where CTS URNs define a permanent notation for citing texts, independent of any technology, Canonical Text Services provide a network service that can equate XML documents with the work referred to by a CTS URN, and can retrieve a well-formed XML fragment for a passage referred to in a CTS URN.

The CTS architecture and design goals

The Canonical Text Services protocol defines interaction between a client and a server program using the HTTP protocol: clients submit requests, with parameters included as HTTP GET parameters; the CTS response is structured in XML validating against the CTS reply schemas. While a user could therefore interact directly with a CTS by pointing a web browser at URLs formed according to the CTS specification, the purpose of the service is to provide services to software that recognizes CTS URNs.

The vocabulary of requests (highlights summarized below) allows a client to discover metadata about the collection of texts served by a specific CTS instance, as well as to retrieve passages of text.

The server’s metadata catalog, called a “text inventory,” identifies a means (such as a Relax NG schema) for validating the XML realization of a document, and describes how the canonical citation scheme of the CTS URN maps on to the XML representation.

Version 3 of CTS introduced three important changes. First, in CTS 3, documents may validate against any standard method chosen by the service’s administrator, such as Relax NG schemas, XML schemas, or DTDs. As part of this change, CTS 3 now supports XML namespaces. Second, different parts of a document may be cited using different citation schemes. (E.g., a preface might be cited differently from the main body of a work.) Third, an optional extension that implementations may choose either to support or ignore deals with the topological relation of URNs. (For more information, see [URN topology|guide:urntopology].)

h2. Interacting with a CTS: the principal requests Programs (and the programmers who write them) can interact with a CTS using any of the nine defined requests. The request name is always included in an HTTP parameter named @request@; for all requests except the metadata request @GetCapabilities@, a CTS URN is always included in an HTTP parameter named @urn@. Consider this possible series of exchanges between a client program interested in hexameter poetry, and a CTS at the address @http://machine/service@.

h3.@GetCapabilities@: What texts does the service include, and how do I cite them?


The @GetCapabilities@ request takes no further parameters. The reply includes the complete TextInventory, or metadata catalog, for the service. From this information, a client can determine everything the service has to offer: what texts are online, what their citation scheme looks like, whether the service supports optional features such as URN topology. (For more on the information included in a TextInventory, see below.) The following entry for a an edition of the Homeric Hymn to Athena includes the information that the Homeric Hymns are text group @tlg0013@ in the @greekLit@ CTS namespace, and that @tlg011@ is the short Homeric Hymn to Athena. (We could therefore identify this work succinctly with the CTS URN @urn:cts:greekLit:tlg0013.tlg011@.) It further tells us that the Hymn to Athena is cited by poetic line, and that citation values for poetic lines are encoded on the @@n@ attribute of the TEI schema’s @l@ element. But how do we determine what line numbers are valid references? For that, we can use the @GetValidReff@ request.

<groupname xml:lang=“eng”>Homeric Hymns <work xml:lang=“grc-c” projid=“greekLit:tlg011”> <title xml:lang=“eng”>Hymn to Athena

h3. @GetValidReff@: What citation values are valid?


The @urn@ parameter to this request identifies the Homeric Hymn to Athena. The body of the reply includes a complete list of every CTS URN that is valid for this very short poem, in the order in which they appear in the text, and could look like this:

{note} urn:cts:greekLit:tlg0013.tlg011:1 urn:cts:greekLit:tlg0013.tlg011:2 urn:cts:greekLit:tlg0013.tlg011:3 urn:cts:greekLit:tlg0013.tlg011:4 urn:cts:greekLit:tlg0013.tlg011:5 {note}

Optionally, @GetValidReff@ requests may include a @level@ parameter, defining the depth of the citation scheme to consider. For a work with a single level of citation, such as a poem cited by lines, that option is irrelevant, but if wanted to discover valid references for books of the Iliad (rather than lines) included in a CTS, we could submit a @GetValidReff@ request with a value of 1 for the @level@ parameter. If our @GetCapabilities@ reply tells us that the Iliad is work @tlg001@ in text group @tlg0012@ in the @greekLit@ namespace, the request would be:


The reply would include only 24 URNs (one for each book of the Iliad), resolved only to the first level (books) of the citation hierarchy, not the second level of individual lines.

If we subsequently wanted to discover what line numbers are valid within book 10 of the Iliad, we could submit a @urn@ limited to that book:


  • @GetPassage@: What is the text of this passage?

    Applications might choose to batch process and store metadata about texts, and even lists of valid reference values, but the heart of the interaction between a CTS and client programs is retrieving passages of text for a given URN. The body of the reply contains a well-formed XML fragment with the requested passage of text framed by all its parent elements. The sample request above asks for line 1 of the Homeric Hymn to Athena; the body of a reply could look like this if the text were marked up in TEI-conformant XML:

    {note} Παλλάδ’ Ἀθηναίην ἐρυσίπτολιν ἄρχομ’ ἀείδειν {note}

    h3. @GetPrevNextUrn@: What is the following (or preceding) passage?


    The string making up the reference component of a URN is arbitrary (e.g., it is perfectly legitimate for a line labelled “320” to precede a line labelled “319”), but URNs have an inherent order: the document order of the text units they refer to. While applications can parse the results of a @GetValidReff@ to determine what URNs precede or follow a given URN, it is also possible to request this information directly. The example asks for the URNs preceding and following line 2 of the Homeric Hymn to Athena. The body of the reply would be:

    {note} urn:cts:greekLit:tlg0013.tlg011:1 urn:cts:greekLit:tlg0013.tlg011:3 {note}

    h3. @GetPassagePlus@: Can we simplify this exchange?

    Applications supporting navigation of a text regularly need to submit @GetPassage@ and @GetPrevNextUrn@ in tandem. To simplify this (and cut in half the number of client/server round trips needed to navigate a text), the @GetPassagePlus@ request works exactly like the @GetPassage@ request, except that it packages in the reply both the XML of the requested passage, and the @prevnext@ element of a @GetPrevNextUrn@ request.

    h2. Managing a CTS: the TextInventory

    A CTS implementation might manage the service’s metadata in any way it chooses. It might store the data in a database with a form-based user interface, for example. But the metadata is presented to client applications as XML validating against the CTS TextInventory schema, so we will survey the main components of the TextInventory as they appear serialized to XML.

    The TextInventory includes three main parts:

    a list of standard citation schemes

    a list of the individual TextGroups, Works, Editions,

    Translations, and Exemplars of documents known to the server

    a list of organization units called Collections

    The list of groups, works, etc., is a hierarchical organization used to identify works uniquely, according to some familiar, well established convention.
    The collections on the other hand allow the administrator of a CTS to group sets of works together for any purpose.

    Of these three sections, the most important is the list of groups and works. It is organized as follows

    h3. The Text Inventory: Groups and Works

    The list of works contains a list of…

    includes one or more titles, and, optionally, may be instantiated in…

    *…zero or more Edition elements and/or Translation elements

    Editions and translations are specific versions of a notional work, that may be represented by multiple physical copies. Each has an identifier unique within the Work. The TextInventory may here list bibliographic information. since the Canonical Text Services protocol allows editors to work with information about texts that are online and texts that are not. Further, an Edition or Translation may optionally contain …

    *…zero or more Exemplar elements.

       Exemplars are specific physical copies of an
            Edition or Translation.  Each has an identifier unique
            within its containing Edition or Translation.
            Documenting individual examplars can be particularly
            important for early print editions, but would also
            allow an epigraphic editor the option of treating
            multiple copies of a single inscription as exemplars
            of an edition.
      If the server can deliver an electronic
            version at the level of the Edition element,
            the Translation element or
            one of their Exemplars, that element
            will contain…

    *…one Online element

       The Online element contains
             information about the citation scheme of
             that electronic text.
             (See details below.)
             It also includes 
             information that a server implementation
             could use to translate the abstract reference
             into terms used for local retrieval,
             such as
             a filename or database lookup.

    So, for example, a TextInventory entry for the Homeric Hymn to Athena could contain the following information:

    {note} TextGroup: tlg0013 (Homeric Hymns) Work: tlg011 (Hymn to Athena) Edition: chs01 (CHS electronic edition) Online: local document reference = @tlg0013/tlg0013.tlg011.chs02.xml@ Translation: chs02 (English translation by H. Evelyn-White, now in the public domain) {note}

    Each Online element—be it an edition, translation, or exemplar— contains three elements: one identifies how the XML document can be validated, a second identifies the citation scheme with an identifier from the list of citation schemes used in this service, and a third element contains a recursive list of @citation@ elements mapping each level of the citation scheme to part of the XML document.

    Our example of the Homeric Hymn to Athena cites by a single level, the poetic line, and could be documented like this:


    An @online@ element for the two-tiered citation of the Iliad illustrates the usage of the @citation@ element’s @scope@ and @xpath@ attributes. Each provide templates for XPath expressions, in which question marks (@?@) can be replaced by the value of one level of a citation. The @xpath@ attribute identifies an XML unit corresponding to a level of the citation scheme; the @scope@ attribute identifies a context in the document where this xpath applies. (The two are distinct because a document’s markup might include markup between levels of the citation scheme.)

    {note} {note}

    h2. Further information

    More detailed information about version 3 of CTS is currently in preparation; links will be posted here when it is made available from the project’s sourceforge site.