TeiReader

Companion object TeiReader

case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable

Factory for Vectors of HmtToken instances.

Example

The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.

Example:

val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

How it works

The TeiReader object maintains three mutable buffers, nodeText (a StringBuilder), wrappedWordBuffer and tokenBuffer (both mutable ArrayBuffers).

Linear Supertypes

Serializable, Serializable, Product, Equals, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

TeiReader
Serializable
Serializable
Product
Equals
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new TeiReader(twoColumns: String, delimiter: String = "#")

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def abbrExpanChoice(hmtToken: HmtToken, el: Elem): Unit
Collect tokens from a TEI abbr-expan pair.
Collect tokens from a TEI abbr-expan pair.
Results are added to the TeiReader's tokenBuffer.
hmtToken
token reflecting reading values for parent context
el
TEI choice element with abbr-expan children
def addTokensFromElement(el: Elem, tokenSettings: HmtToken): Unit
Parse an XML element and add all tokens in it to tokenBuffer.
Parse an XML element and add all tokens in it to tokenBuffer.
el
XML element to parse.
tokenSettings
Initial contextual setting for tokens.
def addTokensFromText(s: String, tokenSettings: HmtToken): Unit
Parse a string and add all tokens in it to tokenBuffer.
Parse a string and add all tokens in it to tokenBuffer.
s
String to parse.
tokenSettings
Initial contextual setting for tokens.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clear: Unit
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@native() @throws( ... )
def collectCited(currToken: HmtToken, citElem: Elem): Unit
collect tokens from cited context
collect tokens from cited context
currToken
token reflecting reading values for parent context
citElem
TEI cit element
def collectRefString(currToken: HmtToken, rsElem: Elem): Unit
collect appropriate type of token for varieties of TEI rs usage
collect appropriate type of token for varieties of TEI rs usage
currToken
token reflecting reading values for parent context
rsElem
TEI rs element
def collectTokens(currToken: HmtToken, n: Node): Unit
Collect all tokens descended from a given XML node.
Collect all tokens descended from a given XML node. Results are collected in tokenBuffer.
currToken
token reflecting reading values for parent context
n
XML node to collect content from
def collectWrappedWordReadings(editorialStatus: EditorialStatus, n: Node): Unit
recursively collect all Reading objects descended from a given node, and add a Vector of Readings to the TeiReader's wrappedWordBuffer
recursively collect all Reading objects descended from a given node, and add a Vector of Readings to the TeiReader's wrappedWordBuffer
editorialStatus
editorial status of surrounding context
n
node to descend from
def ctsSafe(s: String): String
URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.
URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.
s
String to use as extended citation string of a CtsUrn.
def deletedText(hmtToken: HmtToken, el: Elem): Unit
val delimiter: String
def disambiguateNamedEntity(currToken: HmtToken, el: Elem): Unit
collect tokens with appropriate disambiguation for varieties of named entities
collect tokens with appropriate disambiguation for varieties of named entities
currToken
token reflecting reading values for parent context
el
a TEI element disambiguating a named entity. Should be one of persName, placeName or rs with type = ethnic
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def getAlternate(hmtToken: HmtToken, choiceElem: Elem): Unit
get alternates as well as tokens from a TEI choice element
get alternates as well as tokens from a TEI choice element
hmtToken
token reflecting reading values for parent context
choiceElem
TEI choice element
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def indexSubstring(s: String, sub: String): Int
find CTS subref index value of sub in s
find CTS subref index value of sub in s
The map in the hideously global tokenIndexCount is updated as a side effect of this.
s
string to index in
sub
substring to find in s
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
var nodeText: StringBuilder
Builder for recursively accumulated String value of a single token.
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def origRegChoice(hmtToken: HmtToken, el: Elem): Unit
collect tokens from a TEI orig-reg pair
collect tokens from a TEI orig-reg pair
Results are added to the TeiReader's tokenBuffer.
hmtToken
token reflecting reading values for parent context
el
TEI choice element with orig-reg children
val punctuationSplitter: String
Terrifying regular expression to split a string on HMT Greek punctuation characters while keeping the punctuation characters as individual tokens.
def sicCorrChoice(hmtToken: HmtToken, el: Elem): Unit
collect tokens from a TEI sic-corr pair
collect tokens from a TEI sic-corr pair
Results are added to the TeiReader's tokenBuffer.
hmtToken
token reflecting reading values for parent context
el
TEI choice element with sic-corr children
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def teiToTokens(u: CtsUrn, xmlStr: String, tokenCount: Int = 0): Vector[TokenAnalysis]
Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.
Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.
u
URN for the citable node.
xmlStr
XML text for the citable node.
tokenCount
Index of this token within the containing canonically citable passage of text.
var tokenBuffer: ArrayBuffer[HmtToken]
Buffer of recursively accumulated HmtTokens.
def tokens: Vector[TokenAnalysis]
Parse a String in two-column format into a vector of analyzed tokens.
def tokensFromNodeVector(nodes: Vector[CitableNode], tokens: Vector[TokenAnalysis]): Vector[TokenAnalysis]
Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.
Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.
nodes
Vector of CitableNode objects. Their text content must be XML conforming to HMT project conventions.
tokens
Accumulated Vector of analyzed tokens.
val twoColumns: String
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@native() @throws( ... )
var wrappedWordBuffer: ArrayBuffer[Reading]
Buffer of recursively accumulated Readings for a single token.

Packages

Overview

TeiReader

Companion object TeiReader

case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable

Example

How it works

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Overview

TeiReader 

Companion object TeiReader

case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable

Example

How it works

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

TeiReader