A concise guide to understanding HMT project digital publications (2018)

What does the HMT project publish?

  • The HMT project does not develop software.
  • The HMT project creates long-lived scholarly data sets.

What content is included HMT publications?

All material in HMT publications follows explicit models that do not depend on any specific technology. The most important generic models are:

The HMT project is developing a project-specific model of the contents of citable texts. This model describes the contents of a citable passage of text in multiple layers from the editorial status of a string of characters through different levels of editorial disambiguation and interpretation of tokens. (Initial documentation: https://homermultitext.github.io/hmt-editing-principles/)

What formats are used for HMT publications?

Contributors to the HMT assemble material in TEI-compliant XML files and in tabular delimited-text files. An automated publication process composites all of the source material in a single text file in CEX format (specification linked here.)

How are HMT publications verified?

Before publication, a composite CEX file encoding the entire contents of the archive is analyzed for inconsistencies in content and structure. A detailed listing of every error is recorded in a human-reable list of corrigenda. A machine-generated textual summary and visualizations of different aspects of the publication are written as files in markdown format.

The automated verification depends on a number of code libraries. To assess the quality of our automated evaluation of an edition, the main libraries that should be evaluated are:

Each library includes a suite of automated tests, API documentation, and some additional end-user documentation.

How do I find HMT publications?

Published releases, comprised of a single data set in .cex format, a catalog of corrigenda, and a folder with user-readable reports, are committed to the releases-cex directory of the project’s hmt-archive github repository: https://github.com/homermultitext/hmt-archive/tree/master/releases-cex

While the rest of the archive is constantly changing, files committed to this directory should be immutable. Instead of updating them, new uniquely identified releases are committed to the same directory.

What software can I use to work with HMT publications?

Since CEX files are just plain-text files, you can use any tools that work with text. For example:

  • You can inspect or browse an HMT publication with a text editor, search it with command-line tools like grep, or of course read the file with scripts written in any language.
  • Individual labelled CEX blocks are easily imported into databases or statistical software.
  • CITE-App is a browser-based app for working with data loaded from a CEX source:

https://github.com/cite-architecture/CITE-App

What other kinds of access does the HMT project offer?

The principal publication is the CEX source that can be downloaded from github. In addition, we currently offer:


Website © 2019-2020, the Homer Multitext project. For licensing on image collections, see the Image Archive page.

Powered by Hydejack v8.1.1