Running a CHS image service

This guide describes how to set up and run the CHS Image service. The CHS Image Extension to CITE Collections is described here.

The implementation consists of a small servlet written in groovy (aka a “groovlet”) that fields requests, and talks to two back-end components: a fast cgi program to manipulate binary image data, and a relational database storing information about canonically identified images.

Prerequisites

For the impatient

Once you have an IIP Image Server running as described below, and have set up JDBC database with the required schema, you can install the binary .war file as follows:

If you prefer to build your own war from source, follow these steps:

You can test your configuration by running the service in a local jetty instance from gradle with

gradle jettyRunWar

The component for manipulating binary image data

The fast cgi program that manipulates the binary image data is IIPImage: see http://iipimage.sourceforge.net/documentation/server. Images are stored in a file system visible to the fast cgi program in the pyramidal tiff format. IIPImage expects to be handed an explicit path to the image file: the groovlet consults the relational database to translate canonical identifiers into explicit locations in a file system. To set up a CHS Image Service, your first step should be to load your pyramidal tiffs on to a server where you have installed IIPImage, and verify that you can retrieve a binary image: test with a URL like this:

http://YOURSERVER/fcgi-bin/iipsrv.fcgi?OBJ=IIP,1.0&FIF=/PATH/TO/IMG/FILE.tif&CVT=JPEG

You may group related images in a single directory, and identify them in the relational database as an “image group,” as explained in the following section.

[Technical note: For replies to the GetBinaryImage request, the fastcgi program does not need to be on the same server as the groovlet. This can be very handy since it means a very minimal machine could field service requests, and defer to another machine the storage and processing of massive amounts of image data. With replies to the GetIIPMooViewer request, however, security restrictions on javascript require that all the ongoing interaction in a zoomable interface must retrieve binary image data from the same machine that the original GetIIPMooViewer’ request was sent to.]

The relational database

The relational manages information about images in three related tables:

1.  A table of 1 or more CITE Collection namespaces.  
2.  A table of 1 or more image groups:  these are sets of images stored in a single directory.
3.  A table of 1 or more images

The present implementation uses the Postgres RDBMS, but the groovlet only relies on JDBC for queries, so the same schema could be used in any database system with JDBC drivers. A sample schema for postgres is included in the downloaded project.

namespace table

The namespace table has three required columns. These give:

  1. the full URI identifying the namespace (and note that as with XML namespaces this is just a unique identifier, and may or may not be a working URL)
  2. the abbreviation used for this namespace in CITE Collection URNs
  3. a human-readable description of the namespace. Here is an example:
abbr url label
chsimg http://chs.harvard.edu/datans/chsimg Center for Hellenic Studies, data namespace for cataloged images

img_group table

There are four required columns for image groups. They are:

  1. an integer identifying the group
  2. an explicit path to the directory where images in this group are stored
  3. a human-readable description of the images in this group
  4. a brief or summary label for the group.

Here is an example:

id img_dir description label
4 /project/homer/pyramidal/Upsilon-1-1 Marciana 458 Upsilon-1-1

img table

The image table has five required columns. The image identifier and the namespace are the ID and namespace components of a CITE Collection URN. The group_id is the integer ID of the group this image belongs to, and must be a valid reference to an ID in the img_group table. The caption and rights columns are text fields with the metadata to return for the GetCaption and GetRights requests, respectively. Here is an example:

img_id ns group_id caption rights
urn:cite:hmt:u4.U4001RN-0001 chsimg 4 Marciana 458, folio 1, recto. This image was derived from an original ©2007, Biblioteca Nazionale Marciana, Venezie, Italia. The derivative image is ©2010, Center for Hellenic Studies. Original and derivative are licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. The CHS/Marciana Imaging Project was directed by David Jacobs of the British Library.

The service config file

Configuration files are kept in the configs directory of the webapp. You may include multiple configuration files in this directory; include an HTML parameter named config to identify a specific configuration file to use. In this way, you can in effect configure multiple virtual servers from a single instance of your CHS Image server. The default configuration file if no config parameter is given is citeconfig.xml.

Configuration files are simple XML files. The root element, named chsimgimgconfig, has an obligatory @url parameter giving the base URL of service requests.

Two elements are required: jdbcConfig has mandatory attributes for the JDBC settings of your database; iipimage has a mandatory @url parameter with the URL of your IIP Image installation. If you want to allow zooming interfaces, as well as static, binary files, you must set the value of the @zoomable attribute to ‘true’.

Optionally, you may include a contents element, containing one or more include elements, to specify which image groups should be included in this configuration. The @label attribute gives a value from the label field of the image groups table in your database. If you include a contents element, only those image groups will be reported on in the servlet’s ‘contents’ page. By default, all image groups in your image group table are reported on.

A brief worked example: how chsimg handles a request

An end user wants a binary image showing just the left column of text on folio 1 of manuscript U4. The user’s software converts this desire into this request to submit to your CHS Image Service:

request=GetBinaryImage&urn=urn:cite:hmt:chsimg.U4001RN-0001:0.1,0.1,0.3,0.8

Your groovlet verifies that the urn value is syntactically valid, and looks up in your database the record for an image in the chsimg namespace with URN urn:cite:chsimg:U4001RN-0001. It finds that this image belongs to group 4 in your system’s local organization of files, and determines that image group 4 is in the local directory /project/homer/pyramidal/U4. The groovlet uses this information to submit an appropriate request to IIPServer, and the client program receives binary image data. Try this in a web browser to see an implementation:

http://amphoreus.hpcc.uh.edu/tomcat/chsimg/Img?request=GetBinaryImage&urn=urn:cite:hmt:chsimg.U4001RN-0001:0.1,0.1,0.3,0.8