Old notes on using the `hmt-textmodel` library

Overview: analyze a corpus of texts

Use the ohco2 library to create a corpus of the whole HMT project:

val corpus = CorpusSource.fromFile("data/hmtarchive_2cols.tsv")

Use URN twiddling to select the main scholia as a corpus:

val mainScholia = corpus ~~ CtsUrn("urn:cts:greekLit:tlg5026.msA:")

Analyze the scholia. The result is a Vector of TokenAnalysis objects.

val tokens = TeiReader.fromCorpus(mainScholia)

Turn a vector of analyses into a new text corpus citable by token

Create a vector of text nodes using alternate readings, where they exist, rather than diplomatic readings, and make a text corpus from that:

val citableNodes = tokens.map(_.readWithAlternate)
val normalizedCorpus = Corpus(citableNodes)

Create a text corpus using purely diplomatic readings:

val citableNodes = tokens.map(_.readWithDiplomatic)
val diplomaticCorpus = Corpus(citableNodes)

Filter analyses by alternate readings

From a vector of analyses, select all the analyses that include any category of alternate reading differing from the diplomatic reading:

val altTokens = tokens.filter(_.hasAlternate)

Select all analyses that include an expanded abbrevation:

val abbrTokens = tokens.filter(_.isAbbreviation)

Select all analyses that include a scribal correction:

val corrTokens = tokens.filter(_.hasScribalCorrection)

Select all analyses that include a scribal multiform:

val multiformTokens = tokens.filter(_.hasMultiform)

Filter analyses by lexical disambiguation

Use URN twiddling to select all analyses with matching lexical disambiguation. E.g., select all instances of personal name Achilles:

val achilles = Cite2Urn("urn:cite2:hmt:pers:pers1")
val achillesTokens = tokens.filter(_.lexMatch(achilles))

Filter analyses by discourse analysis

Use these filters to select tokens by category of discourse:

val directVoice = tokens.filter(_.isDirectVoice)
val citations = tokens.filter(_.isCitation)
val quotedText = tokens.filter(_.isQuotedText)
val quotedLiteral = tokens.filter(_.isQuotedLiteral)
val quotedLanguage = tokens.filter(_.isQuotedLanguage)
val notDirect = tokens.filter(_.notDirectVoice)

Combinations: create citable nodes for selected analyses

Find all analyses of tokens as citations, and create a vector of citable text nodes using expanded (alternate) readings rather than diplomatic view:

val citations = tokens.filter(_.isCitation)
val citationPassages = citations.map(_.readWithAlternate)

Since citationPassages is just a Vector of citable nodes, you can do normal Scala magic with it, e.g., create a string listing the text of each token on a single line:

val citedList = citationPassages.map(_.text).mkString("\n")

Website © 2019-2020, the Homer Multitext project. For licensing on image collections, see the Image Archive page.

Powered by Hydejack v8.1.1