edmodel

package edmodel

Provides classes modelling HMT editions of texts.

Overview

The starting point is the factory object TeiReader, that can read data in the OHCO2 model from a two-column file or a Corpus object to produce a Vector of TokenAnalysis objects. The TokenAnalysis pairs a CtsUrn for the citable text node with a fully analyzed HmtToken. Example:

val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

The HmtToken captures everything known about a token from an HMT edition. See its documentation for more details.

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

edmodel
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Type Members

sealed trait AlternateCategory extends AnyRef
All possible categories for alternate readings are enumerated by case objects extending this trait
All possible categories for alternate readings are enumerated by case objects extending this trait
Used by org.homermultitext.edmodel.AlternateReading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader
case class AlternateReading(alternateCategory: AlternateCategory, reading: Vector[Reading]) extends Product with Serializable
an alternate reading for a token
an alternate reading for a token
The name member must be implemented with an English description of the editorial status
alternateCategory
category of alternate reading
reading
all org.homermultitext.edmodel.Readings for this alternate reading
sealed trait DiscourseCategory extends AnyRef
All possible categories for discourse of a token are enumerated by case objects extending this trait
All possible categories for discourse of a token are enumerated by case objects extending this trait
The name member must be implemented with an English description of the discourse status
Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader
sealed trait EditorialStatus extends AnyRef
All possible values for the editorial status of a token are enumerated by case objects extending this trait
All possible values for the editorial status of a token are enumerated by case objects extending this trait
The name member must be implemented with an English description of the editorial status
Used by org.homermultitext.edmodel.Reading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader
case class HmtOrcaToken(urn: CtsUrn, src: CtsUrn, textDeformation: String, hmtToken: HmtToken) extends Product with Serializable
token in an ORCA analytical exemplar
token in an ORCA analytical exemplar
urn
exemplar-level URN identifying this token in a specific reading of a HMT edition
src
URN of passage read or analyzed
textDeformation
string view of this token
hmtToken
full analysis of this token
case class HmtReading(title: String, description: String, tokens: Vector[HmtOrcaToken]) extends Product with Serializable
a complete reading of a text expressed as an analytical exemplar
a complete reading of a text expressed as an analytical exemplar
title
labelling string or title of edition
tokens
sequence of org.homermultitext.edmodel.HmtOrcaTokens defining an analytical edition
case class HmtToken(analysis: Cite2Urn, sourceUrn: CtsUrn, editionUrn: CtsUrn, lang: String = "grc", readings: Vector[Reading], lexicalCategory: LexicalCategory, lexicalDisambiguation: Cite2Urn = ..., alternateReading: Option[AlternateReading] = None, discourse: DiscourseCategory = DirectVoice, externalSource: Option[CtsUrn] = None, errors: ArrayBuffer[String] = ArrayBuffer.empty[String]) extends Product with Serializable
A fully documented, semantically distinct token.
A fully documented, semantically distinct token. The model of this token supports the ORCA model of aligned text analysis. The analysis member is a CITE2 URN representing this token as an ORCA analysis. The sourceUrn member is a CTS URN with subreference index identifying the specific string of text analyzed. TheeditionUrn member is a CTS URN for this token in an analytical exemplar. The other members of the HmtToken provide the analytical data for this token.
analysis
CITE URN for this token analysis.
sourceUrn
URN for this token in the analyzed text
editionUrn
URN for this token in an analytical exemplar when promoted to an edition
lang
3-letter language code for the language code of this token, or a descriptive string if no ISO code defined for this language
readings
All org.homermultitext.edmodel.Readings belonging to this token
lexicalCategory
lexical category of this token
lexicalDisambiguation
URN for automated method to disambiguate tokens of a given type, or manually disambiguated URN for named entity values
alternateReading
optional org.homermultitext.edmodel.AlternateReadings belonging to this token
discourse
category of discourse of this token
externalSource
URN of source this token is quoted from
errors
list of error messages (hopefully empty)
sealed trait LexicalCategory extends AnyRef
All possible lexical categories for a token are enumerated by case objects extending this trait
All possible lexical categories for a token are enumerated by case objects extending this trait
The name member must be implemented with an English description of the lexical category
Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader
case class Reading(reading: String, status: EditorialStatus) extends Product with Serializable
A typed reading of a passage.
A typed reading of a passage.
reading
string read with given status
status
status of the given string
case class ReadingConfig(title: String, description: String) extends Product with Serializable
case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable
Factory for Vectors of HmtToken instances.
Factory for Vectors of HmtToken instances.
Example
The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.
Example:
```
val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)
```
How it works
The TeiReader object maintains three mutable buffers, nodeText (a StringBuilder), wrappedWordBuffer and tokenBuffer (both mutable ArrayBuffers).
case class TextDeformation(text: String) extends Product with Serializable
case class TokenAnalysis(textNode: CtsUrn, analysis: HmtToken) extends Product with Serializable
An analysis of a single token.
An analysis of a single token.
textNode
CtsUrn of the citable node where this token occurs. Note that this will always be equivalent to the version-level URN for containing node for the "edition URN" of theHmtToken, since the edition URN extends the passage hierarchy with a "tokens" exemplar, and extends the passage hierarchy with a further level. Expressed in code, we can say that for any TokenAnalysis ta, the following relation is true:
```
ta.analysis.editionUrn.collapsePassageBy(1) == ta.textNode.addExemplar("tokens")
```
analysis
The analysis of this token as a full HmtToken object.

Value Members

val analyticalCollections: Map[String, Cite2Urn]
def codeptList(s: String, idx: Int = 0, codepoints: List[Int] = Nil): List[Int]
Recursively get list of code points for a String.
Recursively get list of code points for a String.
s
String to get codepoints for.
idx
Index of codepoint to start from.
codepoints
List of codepoints seen so fare.
def collectText(n: Node): String
def collectText(n: Node, s: String): String
Recursively collect contents of all text-node descendants of a given node.
Recursively collect contents of all text-node descendants of a given node.
n
Node to collect from.
returns
A single String with all text from n.
val collectionId: String
val exemplarLabels: Map[String, ReadingConfig]
def hmtNormalize(s: String): String
val punctuation: Vector[String]
val validElements: Vector[String]
val versionId: String
object AlternateReading extends Serializable
string formatting function
object Citation extends DiscourseCategory with Product with Serializable
object Clear extends EditorialStatus with Product with Serializable
Paleographically unambiguous reading.
object Correction extends AlternateCategory with Product with Serializable
scribal correction of text
object Deletion extends AlternateCategory with Product with Serializable
scribal deletion of text
object DiplomaticEditionFactory
Factory to build a diplomatic edition from a Vector of TokenAnalysiss.
object DirectVoice extends DiscourseCategory with Product with Serializable
token in direct voice of text
object HmtChars
Definitions of allowed characters in HMT editions.
object HmtOrcaToken extends Serializable
object HmtReading extends Serializable
object HmtToken extends Serializable
Factory for labelling information about tokens.
object LexicalToken extends LexicalCategory with Product with Serializable
parseable lexical token
object LiteralToken extends LexicalCategory with Product with Serializable
quoted literal string not parseable as a lexical token
object Missing extends EditorialStatus with Product with Serializable
Lacuna.
object Multiform extends AlternateCategory with Product with Serializable
alternate reading offered by scribe
object NumericToken extends LexicalCategory with Product with Serializable
token in Milesian numeric notation
object Punctuation extends LexicalCategory with Product with Serializable
single punctuation character
object QuotedLanguage extends DiscourseCategory with Product with Serializable
quoted word in the natural language of text
object QuotedLiteral extends DiscourseCategory with Product with Serializable
quoted string of characters not forming a valid lexical entity
object QuotedText extends DiscourseCategory with Product with Serializable
token in quotation of another text
object Reading extends Serializable
Companion object for formatting Vectors of Readings as Strings.
object Restoration extends AlternateCategory with Product with Serializable
restored by modern editor
restored by modern editor
This should only apply to editorial expansions of abbreviations.
object Restored extends EditorialStatus with Product with Serializable
Reading supplied by modern editor.
Reading supplied by modern editor.
Applies only to editorial expansion of abbreviations.
object TeiReader extends Serializable
object TextDeformation extends Serializable
Factory for Vectors of org.homermultitext.edmodel.HmtOrcaToken instances.
object Unclear extends EditorialStatus with Product with Serializable
Paleographically ambiguous reading.
object Unintelligible extends LexicalCategory with Product with Serializable
token not parseable due to error in HMT edition

Packages

Overview

edmodel

package edmodel

Overview

Type Members

Example

How it works

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Overview

edmodel 

package edmodel

Overview

Type Members

Example

How it works

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

edmodel