package edmodel
Provides classes modelling HMT editions of texts.
Overview
The starting point is the factory object TeiReader, that can read data in the OHCO2 model from a two-column file or a Corpus object to produce a Vector of TokenAnalysis objects. The TokenAnalysis pairs a CtsUrn for the citable text node with a fully analyzed HmtToken. Example:
val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)
The HmtToken captures everything known about a token from an HMT edition. See its documentation for more details.
- Alphabetic
- By Inheritance
- edmodel
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
sealed
trait
AlternateCategory extends AnyRef
All possible categories for alternate readings are enumerated by case objects extending this trait
All possible categories for alternate readings are enumerated by case objects extending this trait
Used by org.homermultitext.edmodel.AlternateReading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader
-
case class
AlternateReading(alternateCategory: AlternateCategory, reading: Vector[Reading]) extends Product with Serializable
an alternate reading for a token
an alternate reading for a token
The
name
member must be implemented with an English description of the editorial status- alternateCategory
category of alternate reading
- reading
all org.homermultitext.edmodel.Readings for this alternate reading
-
sealed
trait
DiscourseCategory extends AnyRef
All possible categories for discourse of a token are enumerated by case objects extending this trait
All possible categories for discourse of a token are enumerated by case objects extending this trait
The
name
member must be implemented with an English description of the discourse statusUsed by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader
-
sealed
trait
EditorialStatus extends AnyRef
All possible values for the editorial status of a token are enumerated by case objects extending this trait
All possible values for the editorial status of a token are enumerated by case objects extending this trait
The
name
member must be implemented with an English description of the editorial statusUsed by org.homermultitext.edmodel.Reading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader
-
case class
HmtOrcaToken(urn: CtsUrn, src: CtsUrn, textDeformation: String, hmtToken: HmtToken) extends Product with Serializable
token in an ORCA analytical exemplar
token in an ORCA analytical exemplar
- urn
exemplar-level URN identifying this token in a specific reading of a HMT edition
- src
URN of passage read or analyzed
- textDeformation
string view of this token
- hmtToken
full analysis of this token
-
case class
HmtReading(title: String, description: String, tokens: Vector[HmtOrcaToken]) extends Product with Serializable
a complete reading of a text expressed as an analytical exemplar
a complete reading of a text expressed as an analytical exemplar
- title
labelling string or title of edition
- tokens
sequence of org.homermultitext.edmodel.HmtOrcaTokens defining an analytical edition
-
case class
HmtToken(analysis: Cite2Urn, sourceUrn: CtsUrn, editionUrn: CtsUrn, lang: String = "grc", readings: Vector[Reading], lexicalCategory: LexicalCategory, lexicalDisambiguation: Cite2Urn = ..., alternateReading: Option[AlternateReading] = None, discourse: DiscourseCategory = DirectVoice, externalSource: Option[CtsUrn] = None, errors: ArrayBuffer[String] = ArrayBuffer.empty[String]) extends Product with Serializable
A fully documented, semantically distinct token.
A fully documented, semantically distinct token. The model of this token supports the ORCA model of aligned text analysis. The
analysis
member is a CITE2 URN representing this token as an ORCA analysis. ThesourceUrn
member is a CTS URN with subreference index identifying the specific string of text analyzed. TheeditionUrn
member is a CTS URN for this token in an analytical exemplar. The other members of the HmtToken provide the analytical data for this token.- analysis
CITE URN for this token analysis.
- sourceUrn
URN for this token in the analyzed text
- editionUrn
URN for this token in an analytical exemplar when promoted to an edition
- lang
3-letter language code for the language code of this token, or a descriptive string if no ISO code defined for this language
- readings
All org.homermultitext.edmodel.Readings belonging to this token
- lexicalCategory
lexical category of this token
- lexicalDisambiguation
URN for automated method to disambiguate tokens of a given type, or manually disambiguated URN for named entity values
- alternateReading
optional org.homermultitext.edmodel.AlternateReadings belonging to this token
- discourse
category of discourse of this token
- externalSource
URN of source this token is quoted from
- errors
list of error messages (hopefully empty)
-
sealed
trait
LexicalCategory extends AnyRef
All possible lexical categories for a token are enumerated by case objects extending this trait
All possible lexical categories for a token are enumerated by case objects extending this trait
The
name
member must be implemented with an English description of the lexical categoryUsed by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader
-
case class
Reading(reading: String, status: EditorialStatus) extends Product with Serializable
A typed reading of a passage.
A typed reading of a passage.
- reading
string read with given status
- status
status of the given string
- case class ReadingConfig(title: String, description: String) extends Product with Serializable
-
case class
TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable
Factory for Vectors of HmtToken instances.
Factory for Vectors of HmtToken instances.
Example
The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.
Example:
val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)
How it works
The TeiReader object maintains three mutable buffers,
nodeText
(a StringBuilder),wrappedWordBuffer
andtokenBuffer
(both mutable ArrayBuffers). - case class TextDeformation(text: String) extends Product with Serializable
-
case class
TokenAnalysis(textNode: CtsUrn, analysis: HmtToken) extends Product with Serializable
An analysis of a single token.
An analysis of a single token.
- textNode
CtsUrn of the citable node where this token occurs. Note that this will always be equivalent to the version-level URN for containing node for the "edition URN" of theHmtToken, since the edition URN extends the passage hierarchy with a "tokens" exemplar, and extends the passage hierarchy with a further level. Expressed in code, we can say that for any TokenAnalysis ta, the following relation is true:
ta.analysis.editionUrn.collapsePassageBy(1) == ta.textNode.addExemplar("tokens")
- analysis
The analysis of this token as a full HmtToken object.
Value Members
- val analyticalCollections: Map[String, Cite2Urn]
-
def
codeptList(s: String, idx: Int = 0, codepoints: List[Int] = Nil): List[Int]
Recursively get list of code points for a String.
Recursively get list of code points for a String.
- s
String to get codepoints for.
- idx
Index of codepoint to start from.
- codepoints
List of codepoints seen so fare.
- def collectText(n: Node): String
-
def
collectText(n: Node, s: String): String
Recursively collect contents of all text-node descendants of a given node.
Recursively collect contents of all text-node descendants of a given node.
- n
Node to collect from.
- returns
A single String with all text from n.
- val collectionId: String
- val exemplarLabels: Map[String, ReadingConfig]
- def hmtNormalize(s: String): String
- val punctuation: Vector[String]
- val validElements: Vector[String]
- val versionId: String
-
object
AlternateReading extends Serializable
string formatting function
- object Citation extends DiscourseCategory with Product with Serializable
-
object
Clear extends EditorialStatus with Product with Serializable
Paleographically unambiguous reading.
-
object
Correction extends AlternateCategory with Product with Serializable
scribal correction of text
-
object
Deletion extends AlternateCategory with Product with Serializable
scribal deletion of text
-
object
DiplomaticEditionFactory
Factory to build a diplomatic edition from a Vector of TokenAnalysiss.
-
object
DirectVoice extends DiscourseCategory with Product with Serializable
token in direct voice of text
-
object
HmtChars
Definitions of allowed characters in HMT editions.
- object HmtOrcaToken extends Serializable
- object HmtReading extends Serializable
-
object
HmtToken extends Serializable
Factory for labelling information about tokens.
-
object
LexicalToken extends LexicalCategory with Product with Serializable
parseable lexical token
-
object
LiteralToken extends LexicalCategory with Product with Serializable
quoted literal string not parseable as a lexical token
-
object
Missing extends EditorialStatus with Product with Serializable
Lacuna.
-
object
Multiform extends AlternateCategory with Product with Serializable
alternate reading offered by scribe
-
object
NumericToken extends LexicalCategory with Product with Serializable
token in Milesian numeric notation
-
object
Punctuation extends LexicalCategory with Product with Serializable
single punctuation character
-
object
QuotedLanguage extends DiscourseCategory with Product with Serializable
quoted word in the natural language of text
-
object
QuotedLiteral extends DiscourseCategory with Product with Serializable
quoted string of characters not forming a valid lexical entity
-
object
QuotedText extends DiscourseCategory with Product with Serializable
token in quotation of another text
-
object
Reading extends Serializable
Companion object for formatting Vectors of Readings as Strings.
-
object
Restoration extends AlternateCategory with Product with Serializable
restored by modern editor
restored by modern editor
This should only apply to editorial expansions of abbreviations.
-
object
Restored extends EditorialStatus with Product with Serializable
Reading supplied by modern editor.
Reading supplied by modern editor.
Applies only to editorial expansion of abbreviations.
- object TeiReader extends Serializable
-
object
TextDeformation extends Serializable
Factory for Vectors of org.homermultitext.edmodel.HmtOrcaToken instances.
-
object
Unclear extends EditorialStatus with Product with Serializable
Paleographically ambiguous reading.
-
object
Unintelligible extends LexicalCategory with Product with Serializable
token not parseable due to error in HMT edition