Salt  3.4.2
A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of linguistic data .
org.corpus_tools.salt.common.tokenizer.Clitics Class Reference

Models clitics for a given language, with support for proclitics ((proclitics) and enclitics (enclitics) in this version. More...

Public Member Functions

 Clitics (String proclitics, String enclitics)
 
String getProclitics ()
 
String getEnclitics ()
 

Detailed Description

Models clitics for a given language, with support for proclitics ((proclitics) and enclitics (enclitics) in this version.

Meso- and endoclitics are not yet supported.

The String representation of the respective clitics needs to be a regular expression, as it will be used to Pattern#compile(String) a pattern to split the STextualDS's text, i.e., as below.

Pattern.compile("^" XClitic "(.)$")

Two examples for such a regex string are (note the main group!):

  • Enclitics for English: "('(s|re|ve|d|m|em|ll)|n't)"
  • Proclitics for French: "([dcjlmnstDCJLNMST]'|[Qq]u'|[Jj]usqu'|[Ll]orsqu')"

From Tokenizer.

Author
Stephan Druskat