Salt
3.4.2
A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of linguistic data .
|
This class is a very simple implementation of a tokenizer, which just splits a primary text by a given list of characters. More...
Public Member Functions | |
void | setDocumentGraph (SDocumentGraph documentGraph) |
SDocumentGraph | getDocumentGraph () |
SimpleTokenizer () | |
Initializes a new TTokenizer object. | |
List< SToken > | tokenize (STextualDS textualDSs, Character... separator) |
Sets the STextualDS to be tokenized. More... | |
List< SToken > | tokenize (STextualDS textualDS, Integer startPos, Integer endPos, Character... separator) |
Sets the STextualDS to be tokenized and the language of the text. More... | |
This class is a very simple implementation of a tokenizer, which just splits a primary text by a given list of characters.
e.g. a blank.
List<SToken> org.corpus_tools.salt.common.tokenizer.SimpleTokenizer.tokenize | ( | STextualDS | textualDS, |
Integer | startPos, | ||
Integer | endPos, | ||
Character... | separator | ||
) |
Sets the STextualDS to be tokenized and the language of the text.
If language is null, it will be detected automatically if possible.
sTextualDSs | STextualDS object containing the text to be tokenized |
startPos | start position, if text to be tokenized is subset (0 assumed if set to null) |
endPos | end position, if text to be tokenized is subset (length of text assumed if set to null) |
List<SToken> org.corpus_tools.salt.common.tokenizer.SimpleTokenizer.tokenize | ( | STextualDS | textualDSs, |
Character... | separator | ||
) |
Sets the STextualDS to be tokenized.
Its language will be detected automatically if possible.
textualDSs |