case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable
Factory for Vectors of HmtToken instances.
Example
The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.
Example:
val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)
How it works
The TeiReader object maintains three mutable buffers,
nodeText
(a StringBuilder), wrappedWordBuffer
and tokenBuffer
(both mutable ArrayBuffers).
- Alphabetic
- By Inheritance
- TeiReader
- Serializable
- Serializable
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new TeiReader(twoColumns: String, delimiter: String = "#")
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
abbrExpanChoice(hmtToken: HmtToken, el: Elem): Unit
Collect tokens from a TEI
abbr-expan
pair.Collect tokens from a TEI
abbr-expan
pair.Results are added to the TeiReader's
tokenBuffer
.- hmtToken
token reflecting reading values for parent context
- el
TEI
choice
element withabbr-expan
children
-
def
addTokensFromElement(el: Elem, tokenSettings: HmtToken): Unit
Parse an XML element and add all tokens in it to tokenBuffer.
Parse an XML element and add all tokens in it to tokenBuffer.
- el
XML element to parse.
- tokenSettings
Initial contextual setting for tokens.
-
def
addTokensFromText(s: String, tokenSettings: HmtToken): Unit
Parse a string and add all tokens in it to tokenBuffer.
Parse a string and add all tokens in it to tokenBuffer.
- s
String to parse.
- tokenSettings
Initial contextual setting for tokens.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clear: Unit
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
collectCited(currToken: HmtToken, citElem: Elem): Unit
collect tokens from cited context
collect tokens from cited context
- currToken
token reflecting reading values for parent context
- citElem
TEI
cit
element
-
def
collectRefString(currToken: HmtToken, rsElem: Elem): Unit
collect appropriate type of token for varieties of TEI
rs
usagecollect appropriate type of token for varieties of TEI
rs
usage- currToken
token reflecting reading values for parent context
- rsElem
TEI
rs
element
-
def
collectTokens(currToken: HmtToken, n: Node): Unit
Collect all tokens descended from a given XML node.
Collect all tokens descended from a given XML node. Results are collected in
tokenBuffer
.- currToken
token reflecting reading values for parent context
- n
XML node to collect content from
-
def
collectWrappedWordReadings(editorialStatus: EditorialStatus, n: Node): Unit
recursively collect all Reading objects descended from a given node, and add a Vector of Readings to the TeiReader's
wrappedWordBuffer
-
def
ctsSafe(s: String): String
URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.
URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.
- s
String to use as extended citation string of a CtsUrn.
- def deletedText(hmtToken: HmtToken, el: Elem): Unit
- val delimiter: String
-
def
disambiguateNamedEntity(currToken: HmtToken, el: Elem): Unit
collect tokens with appropriate disambiguation for varieties of named entities
collect tokens with appropriate disambiguation for varieties of named entities
- currToken
token reflecting reading values for parent context
- el
a TEI element disambiguating a named entity. Should be one of
persName
,placeName
orrs
withtype
=ethnic
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getAlternate(hmtToken: HmtToken, choiceElem: Elem): Unit
get alternates as well as tokens from a TEI
choice
elementget alternates as well as tokens from a TEI
choice
element- hmtToken
token reflecting reading values for parent context
- choiceElem
TEI
choice
element
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
indexSubstring(s: String, sub: String): Int
find CTS subref index value of sub in s
find CTS subref index value of sub in s
The map in the hideously global tokenIndexCount is updated as a side effect of this.
- s
string to index in
- sub
substring to find in s
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
var
nodeText: StringBuilder
Builder for recursively accumulated String value of a single token.
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
origRegChoice(hmtToken: HmtToken, el: Elem): Unit
collect tokens from a TEI
orig-reg
paircollect tokens from a TEI
orig-reg
pairResults are added to the TeiReader's
tokenBuffer
.- hmtToken
token reflecting reading values for parent context
- el
TEI
choice
element withorig-reg
children
-
val
punctuationSplitter: String
Terrifying regular expression to split a string on HMT Greek punctuation characters while keeping the punctuation characters as individual tokens.
-
def
sicCorrChoice(hmtToken: HmtToken, el: Elem): Unit
collect tokens from a TEI
sic-corr
paircollect tokens from a TEI
sic-corr
pairResults are added to the TeiReader's
tokenBuffer
.- hmtToken
token reflecting reading values for parent context
- el
TEI
choice
element withsic-corr
children
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
teiToTokens(u: CtsUrn, xmlStr: String, tokenCount: Int = 0): Vector[TokenAnalysis]
Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.
Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.
- u
URN for the citable node.
- xmlStr
XML text for the citable node.
- tokenCount
Index of this token within the containing canonically citable passage of text.
-
var
tokenBuffer: ArrayBuffer[HmtToken]
Buffer of recursively accumulated HmtTokens.
-
def
tokens: Vector[TokenAnalysis]
Parse a String in two-column format into a vector of analyzed tokens.
-
def
tokensFromNodeVector(nodes: Vector[CitableNode], tokens: Vector[TokenAnalysis]): Vector[TokenAnalysis]
Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.
Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.
- nodes
Vector of CitableNode objects. Their text content must be XML conforming to HMT project conventions.
- tokens
Accumulated Vector of analyzed tokens.
- val twoColumns: String
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
var
wrappedWordBuffer: ArrayBuffer[Reading]
Buffer of recursively accumulated Readings for a single token.