A Brief Guide to the Canonical Text Service
What is CTS?
Canonical Text Services identify and retrieve passages of text cited by canonical reference.
Citations are expressed as [CTS URNs|guide:ctsurns]. Text passages are structured in XML that can be validated against some schema or DTD.
Where CTS URNs define a permanent notation for citing texts, independent of any technology, Canonical Text Services provide a network service that can equate XML documents with the work referred to by a CTS URN, and can retrieve a well-formed XML fragment for a passage referred to in a CTS URN.
The CTS architecture and design goals
The Canonical Text Services protocol defines interaction between a client and a server program using the HTTP protocol: clients submit requests, with parameters included as HTTP GET parameters; the CTS response is structured in XML validating against the CTS reply schemas. While a user could therefore interact directly with a CTS by pointing a web browser at URLs formed according to the CTS specification, the purpose of the service is to provide services to software that recognizes CTS URNs.
The vocabulary of requests (highlights summarized below) allows a client to discover metadata about the collection of texts served by a specific CTS instance, as well as to retrieve passages of text.
The server’s metadata catalog, called a “text inventory,” identifies a means (such as a Relax NG schema) for validating the XML realization of a document, and describes how the canonical citation scheme of the CTS URN maps on to the XML representation.
Version 3 of CTS introduced three important changes. First, in CTS 3, documents may validate against any standard method chosen by the service’s administrator, such as Relax NG schemas, XML schemas, or DTDs. As part of this change, CTS 3 now supports XML namespaces. Second, different parts of a document may be cited using different citation schemes. (E.g., a preface might be cited differently from the main body of a work.) Third, an optional extension that implementations may choose either to support or ignore deals with the topological relation of URNs. (For more information, see [URN topology|guide:urntopology].)
h2. Interacting with a CTS: the principal requests Programs (and the programmers who write them) can interact with a CTS using any of the nine defined requests. The request name is always included in an HTTP parameter named @request@; for all requests except the metadata request @GetCapabilities@, a CTS URN is always included in an HTTP parameter named @urn@. Consider this possible series of exchanges between a client program interested in hexameter poetry, and a CTS at the address @http://machine/service@.
h3.@GetCapabilities@: What texts does the service include, and how do I cite them?
@http://machine/service?request=GetCapabilities@
The @GetCapabilities@ request takes no further parameters. The reply includes the complete TextInventory, or metadata catalog, for the service. From this information, a client can determine everything the service has to offer: what texts are online, what their citation scheme looks like, whether the service supports optional features such as URN topology. (For more on the information included in a TextInventory, see below.) The following entry for a an edition of the Homeric Hymn to Athena includes the information that the Homeric Hymns are text group @tlg0013@ in the @greekLit@ CTS namespace, and that @tlg011@ is the short Homeric Hymn to Athena. (We could therefore identify this work succinctly with the CTS URN @urn:cts:greekLit:tlg0013.tlg011@.) It further tells us that the Hymn to Athena is cited by poetic line, and that citation values for poetic lines are encoded on the @@n@ attribute of the TEI schema’s @l@ element. But how do we determine what line numbers are valid references? For that, we can use the @GetValidReff@ request.
h3. @GetValidReff@: What citation values are valid?
@http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0013.tlg011@
The @urn@ parameter to this request identifies the Homeric Hymn to Athena. The body of the reply includes a complete list of every CTS URN that is valid for this very short poem, in the order in which they appear in the text, and could look like this:
{note}
Optionally, @GetValidReff@ requests may include a @level@ parameter, defining the depth of the citation scheme to consider. For a work with a single level of citation, such as a poem cited by lines, that option is irrelevant, but if wanted to discover valid references for books of the Iliad (rather than lines) included in a CTS, we could submit a @GetValidReff@ request with a value of 1 for the @level@ parameter. If our @GetCapabilities@ reply tells us that the Iliad is work @tlg001@ in text group @tlg0012@ in the @greekLit@ namespace, the request would be:
@http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0012.tlg001&level=1@
The reply would include only 24 URNs (one for each book of the Iliad), resolved only to the first level (books) of the citation hierarchy, not the second level of individual lines.
If we subsequently wanted to discover what line numbers are valid within book 10 of the Iliad, we could submit a @urn@ limited to that book:
@http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0012.tlg001:10@
@http://machine/service?request=GetPassage&urn=urn:cts:greekLit:tlg0013.tlg011:1@
Applications might choose to batch process and store metadata about texts, and even lists of valid reference values, but the heart of the interaction between a CTS and client programs is retrieving passages of text for a given URN. The body of the reply contains a well-formed XML fragment with the requested passage of text framed by all its parent elements. The sample request above asks for line 1 of the Homeric Hymn to Athena; the body of a reply could look like this if the text were marked up in TEI-conformant XML:
{note}
h3. @GetPrevNextUrn@: What is the following (or preceding) passage?
@http://machine/service?request=GetPrevNextUrn&urn=urn:cts:greekLit:tlg0013.tlg011:2@
The string making up the reference component of a URN is arbitrary (e.g., it is perfectly legitimate for a line labelled “320” to precede a line labelled “319”), but URNs have an inherent order: the document order of the text units they refer to. While applications can parse the results of a @GetValidReff@ to determine what URNs precede or follow a given URN, it is also possible to request this information directly. The example asks for the URNs preceding and following line 2 of the Homeric Hymn to Athena. The body of the reply would be:
{note}
h3. @GetPassagePlus@: Can we simplify this exchange?
Applications supporting navigation of a text regularly need to submit @GetPassage@ and @GetPrevNextUrn@ in tandem. To simplify this (and cut in half the number of client/server round trips needed to navigate a text), the @GetPassagePlus@ request works exactly like the @GetPassage@ request, except that it packages in the reply both the XML of the requested passage, and the @prevnext@ element of a @GetPrevNextUrn@ request.
h2. Managing a CTS: the TextInventory
A CTS implementation might manage the service’s metadata in any way it chooses. It might store the data in a database with a form-based user interface, for example. But the metadata is presented to client applications as XML validating against the CTS TextInventory schema, so we will survey the main components of the TextInventory as they appear serialized to XML.
The TextInventory includes three main parts:
a list of standard citation schemes
a list of the individual TextGroups, Works, Editions,
Translations, and Exemplars of documents known to the server
a list of organization units called Collections
The list of groups, works, etc., is a hierarchical organization used to identify works uniquely, according to some familiar, well established convention.
The collections on the other hand allow the administrator of a CTS to group sets of works together for any purpose.
Of these three sections, the most important is the list of groups and works. It is organized as follows
h3. The Text Inventory: Groups and Works
The list of works contains a list of…
…one or more TextGroup elements (e.g. “Homer,” “Aristotle”, “inscriptions from a given site”).
Textgroups are traditional, convenient groupings of texts such as “authors” for literary works, or corpus collections for epigraphic or papyrological texts. Each TextGroup has a unique identifier, one or more titles (allowing titles in different languages), and consists of…
*…one or more Work elements (e.g. “Iliad,” “Ἀθηναίων Πολιτεία”)
Works are notional entities, each with an identifier unique within this TextGroup. Each work
includes one or more titles, and, optionally, may be instantiated in…
*…zero or more Edition elements and/or Translation elements
Editions and translations are specific versions of a notional work, that may be represented by multiple physical copies. Each has an identifier unique within the Work. The TextInventory may here list bibliographic information. since the Canonical Text Services protocol allows editors to work with information about texts that are online and texts that are not. Further, an Edition or Translation may optionally contain …
*…zero or more Exemplar elements.
Exemplars are specific physical copies of an
Edition or Translation. Each has an identifier unique
within its containing Edition or Translation.
Documenting individual examplars can be particularly
important for early print editions, but would also
allow an epigraphic editor the option of treating
multiple copies of a single inscription as exemplars
of an edition.
If the server can deliver an electronic
version at the level of the Edition element,
the Translation element or
one of their Exemplars, that element
will contain…
*…one Online element
The Online element contains
information about the citation scheme of
that electronic text.
(See details below.)
It also includes
information that a server implementation
could use to translate the abstract reference
into terms used for local retrieval,
such as
a filename or database lookup.
So, for example, a TextInventory entry for the Homeric Hymn to Athena could contain the following information:
{note} TextGroup: tlg0013 (Homeric Hymns) Work: tlg011 (Hymn to Athena) Edition: chs01 (CHS electronic edition) Online: local document reference = @tlg0013/tlg0013.tlg011.chs02.xml@ Translation: chs02 (English translation by H. Evelyn-White, now in the public domain) {note}
Each Online element—be it an edition, translation, or exemplar— contains three elements: one identifies how the XML document can be validated, a second identifies the citation scheme with an identifier from the list of citation schemes used in this service, and a third element contains a recursive list of @citation@ elements mapping each level of the citation scheme to part of the XML document.
Our example of the Homeric Hymn to Athena cites by a single level, the poetic line, and could be documented like this:
An @online@ element for the two-tiered citation of the Iliad illustrates the usage of the @citation@ element’s @scope@ and @xpath@ attributes. Each provide templates for XPath expressions, in which question marks (@?@) can be replaced by the value of one level of a citation. The @xpath@ attribute identifies an XML unit corresponding to a level of the citation scheme; the @scope@ attribute identifies a context in the document where this xpath applies. (The two are distinct because a document’s markup might include markup between levels of the citation scheme.)
{note}
h2. Further information
More detailed information about version 3 of CTS is currently in preparation; links will be posted here when it is made available from the project’s sourceforge site.