Salt  3.4.2
A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of linguistic data .
org.corpus_tools.salt.common.tokenizer.SimpleTokenizer Class Reference

This class is a very simple implementation of a tokenizer, which just splits a primary text by a given list of characters. More...

Public Member Functions

void setDocumentGraph (SDocumentGraph documentGraph)
 
SDocumentGraph getDocumentGraph ()
 
 SimpleTokenizer ()
 Initializes a new TTokenizer object.
 
List< STokentokenize (STextualDS textualDSs, Character... separator)
 Sets the STextualDS to be tokenized. More...
 
List< STokentokenize (STextualDS textualDS, Integer startPos, Integer endPos, Character... separator)
 Sets the STextualDS to be tokenized and the language of the text. More...
 

Detailed Description

This class is a very simple implementation of a tokenizer, which just splits a primary text by a given list of characters.

e.g. a blank.

Author
Florian Zipser

Member Function Documentation

◆ tokenize() [1/2]

List<SToken> org.corpus_tools.salt.common.tokenizer.SimpleTokenizer.tokenize ( STextualDS  textualDS,
Integer  startPos,
Integer  endPos,
Character...  separator 
)

Sets the STextualDS to be tokenized and the language of the text.

If language is null, it will be detected automatically if possible.

Parameters
sTextualDSsSTextualDS object containing the text to be tokenized
startPosstart position, if text to be tokenized is subset (0 assumed if set to null)
endPosend position, if text to be tokenized is subset (length of text assumed if set to null)

◆ tokenize() [2/2]

List<SToken> org.corpus_tools.salt.common.tokenizer.SimpleTokenizer.tokenize ( STextualDS  textualDSs,
Character...  separator 
)

Sets the STextualDS to be tokenized.

Its language will be detected automatically if possible.

Parameters
textualDSs