A Gentle Introduction to CTS & CITE URNs

Overview

URN stands for “Uniform Resource Name”. A URN is an identifier that follows a standard defined by the Internet Engineering Task Force’s Network Working Group. A URN identifies an object uniquely and according to specified semantics.

Unlike a URL, which identifies where a resource might be found on the internet, a URN identifies the resource itself, independent of its location. So a URN is like a bibliographic citation, while a URL is like a call-number in a library.

The Homer Multitext has built its digital library architecture based on the principle of canonical citation by URNs. The digital archive of the HMT represents millions of “objects”; these range from the concrete to the abstract, from large Byzantine manuscripts, down through the individual folios of each manuscript, the texts that appear in those manuscripts, the Greek words in those texts, personal names, digital photographs, regions-of-interest on photographs, and so forth. Every object we study and publish has a URN.

Our URNs are divided into two types. The first are CTS-URNs. These identify texts. (CTS stands for Canonical Text Services, the protocol of requests and responses that allow us to work with texts over networks.) Second are CITE-URNs. These uniquely identify objects in a set of objects of similar kind: images, manuscripts, entries in commentaries. (CITE stands for Collections, Indices, Texts, and Extensions, the over-arching name for our collection of protocols)

The Text, Image, and Collection Services that the HMT has developed allow us easily to resolve these URNs. That is, we can send a URN to the service attached to a URL, and get back the text, image, or data-object that the URN identifies.

Full Descriptions of URN-formats for Texts, Images, and Collection Object:

Some Imporant URNs

urn:cts:greekLit:tlg0012.tlg001:1.26 Homer, Iliad 1.26 [CTS-URN]
urn:cts:greekLit:tlg0012.tlg001.msA:1.26 Homer, Iliad, Manuscript A edition, 1.26 [CTS-URN]
urn:cts:greekLit:tlg0012.tlg001.msB:1.26 Homer, Iliad, Manuscript B edition, 1.26 [CTS-URN]
urn:cite:hmt:msA.12v Folio 12, verso, of the Manuscript A (the Venetus A) [CITE-URN]
urn:cite:hmt:msB.1r Folio 1, recto, of the Manuscript B (the Venetus A) [CITE-URN]
urn:cite:hmt:chsimg.VA001VN-0503 An image showing folio 1-verso of the Venetus A [CITE-Image-URN]
urn:cite:hmt:chsimg.VA012RN-0013:0.0513,0.2216,0.12,0.0883 A rectangular region-of-interest on an images of folio 12-recto of the Venetus A [CITE-Image-URN]

Some Notes

A CTS-URN or a CITE-URN is unique and immutable. The object it points to will not change. If we edit a text, then it will get a new URN distinguishing this edition from other editions of the same text.

Some CITE-URNs identify physical objects, like a folio in a manuscript. The URN is identifying the object, but what data the Collection Service delivers depends on the curator of the collection, just as the readings in an edition of a text depend on the choice of the editor.

The HMT is able to create a web of associations between millions of objects simply by pairing one URN to another: folio to image, folio to text, folio to codex, text to word. This process of linking can be automated in many cases, or can be the work of trained scholars. The resulting web of interrelations can serve automated processes such as those that assemble the images and texts that appear on a folio, and present them for human readers online, or automated processes that might generate statistics about the layout of text and comments on pages.

Further Reading

Casey Dué, D. Neel Smith, and Christopher W. Blackwell, October 2012