Salt  3.4.2
A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of linguistic data .
org.corpus_tools.salt.samples.SampleGenerator Class Reference

Creates samples of SDocumentGraph and SCorpusGraph instances. More...

Static Public Member Functions

static SCorpusGraph createCorpusStructure (SaltProject saltProject)
 Creates the following corpus structure and adds it to the given salt project. More...
 
static SaltProject createSaltProject ()
 Creates a complete SaltProject object having the complex structure. More...
 
static SCorpusGraph createCorpusStructure ()
 Creates the following structure: More...
 
static SCorpusGraph createCorpusStructure (SCorpusGraph corpGraph1)
 Creates the following structure: More...
 
static SCorpusGraph createCorpusStructure_simple ()
 Creates the following structure: More...
 
static void createDialogue (SDocument document)
 Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects. More...
 
static STextualDS createPrimaryData (SDocument document)
 Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object. More...
 
static STextualDS createPrimaryData (SDocument document, String language)
 Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object. More...
 
static void createTokens (SDocument document)
 Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens: More...
 
static List< STokencreateTokens (SDocument document, STextualDS textualDS)
 Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens: More...
 
static SToken createToken (int start, int end, STextualDS textualDS, SDocument document, SLayer layer)
 Creates a SToken covering the passed position and returns it. More...
 
static void createParallelData (SDocument document)
 
static void createParallelData (SDocument document, boolean setTypeForPointRel)
 Creates a small parallel corpus, containing an english and a german text. More...
 
static void createUntypedParallelData (SDocument document)
 Creates a small parallel corpus, containing an english and a german text. More...
 
static void createMorphologyAnnotations (SDocument document)
 Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object. More...
 
static void createInformationStructureSpan (SDocument document)
 Creates SSpan object above the tokenization. More...
 
static void createInformationStructureAnnotations (SDocument document)
 Annotates the SSpan objects above the tokenization with information structural annotations. More...
 
static void createSyntaxStructure (SDocument document)
 Creates a syntax structure for the given SDocument object. More...
 
static void createSyntaxAnnotations (SDocument document)
 This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument). More...
 
static void createDependencies (SDocument document)
 This method creates the sample's dependency annotation. More...
 
static void createAnaphoricAnnotations (SDocument document)
 
static void createDocumentStructure (SDocument document)
 Creates a document structure containing: More...
 

Static Public Attributes

static final String PRIMARY_TEXT_EN = "Is this example more complicated than it appears to be?"
 The primary text, which is used for the samples.
 
static final String PRIMARY_TEXT_EN_SPK1 = PRIMARY_TEXT_EN
 Primary text of speaker1.
 
static final String PRIMARY_TEXT_EN_SPK2 = "Uhm oh yes!"
 Primary text of speaker2.
 
static final String PRIMARY_TEXT_DE = "Ist dieses Beispiel komplizierter als es zu sein scheint?"
 The primary text, which is used for the samples.
 
static final String MORPHOLOGY_LAYER = "morphology"
 The name of the morphologic layer containing the tokens.
 
static final String LANG_EN = "en"
 iso 639-1 language code for english
 
static final String LANG_DE = "de"
 iso 639-1 language code for german
 
static final String SYNTAX_LAYER = "syntax"
 

Detailed Description

Creates samples of SDocumentGraph and SCorpusGraph instances.

Author
Florian Zipser

Member Function Documentation

◆ createAnaphoricAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createAnaphoricAnnotations ( SDocument  document)
static
Parameters
document

◆ createCorpusStructure() [1/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( )
static

Creates the following structure:

           rootCorpus
      /                    \
 subCorpus1              subCorpus2
 /       \              /        \
doc1     doc2         doc3      doc4
Exceptions
IOException
SAXException

◆ createCorpusStructure() [2/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( SaltProject  saltProject)
static

Creates the following corpus structure and adds it to the given salt project.

           rootCorpus
          /         \
        subCorpus1           subCorpus2
        /      \            /       \
doc1   doc2         doc3     doc4
Exceptions
IOException
SAXException

◆ createCorpusStructure() [3/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( SCorpusGraph  corpGraph1)
static

Creates the following structure:

            rootCorpus
      /                     \
 subCorpus1             subCorpus2
 /       \             /         \
doc1    doc2         doc3       doc4
Exceptions
IOException
SAXException

◆ createCorpusStructure_simple()

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure_simple ( )
static

Creates the following structure:

rootCorpus | doc1
Exceptions
IOException
SAXException

◆ createDependencies()

static void org.corpus_tools.salt.samples.SampleGenerator.createDependencies ( SDocument  document)
static

This method creates the sample's dependency annotation.

Parameters
document

◆ createDialogue()

static void org.corpus_tools.salt.samples.SampleGenerator.createDialogue ( SDocument  document)
static

Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects.

The texts are {@value PRIMARY_TEXT_EN_SPK1} and {@value PRIMARY_TEXT_EN_SPK2}, which are tokenized by words. The words 'to' and 'Oh' have been said simultaneously and are overlapping via the timeline.

Parameters
documentdocument to be filled

◆ createDocumentStructure()

static void org.corpus_tools.salt.samples.SampleGenerator.createDocumentStructure ( SDocument  document)
static

Creates a document structure containing:

  • primary text
  • tokenization
  • morphological annotations
  • information structure annotation
  • syntactical annotation
  • anaphoric annotation
Parameters
document

◆ createInformationStructureAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createInformationStructureAnnotations ( SDocument  document)
static

Annotates the SSpan objects above the tokenization with information structural annotations.

Parameters
document

◆ createInformationStructureSpan()

static void org.corpus_tools.salt.samples.SampleGenerator.createInformationStructureSpan ( SDocument  document)
static

Creates SSpan object above the tokenization.

contrast-focus topic
Is this example more complicated than it appears to be
Parameters
document

◆ createMorphologyAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createMorphologyAnnotations ( SDocument  document)
static

Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object.

token pos lemma
Is VBZ be
this DT this
example NN example
more ABR more
complicated JJ complicated
than IN than
it PRP it
appears VBZ appear
to TO to
be VB be
Parameters
documentthe document containing the SToken and STextualDS objects

◆ createParallelData()

static void org.corpus_tools.salt.samples.SampleGenerator.createParallelData ( SDocument  document,
boolean  setTypeForPointRel 
)
static

Creates a small parallel corpus, containing an english and a german text.

The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders.

Parameters
documenthe document containing the STextualDS objects

◆ createPrimaryData() [1/2]

static STextualDS org.corpus_tools.salt.samples.SampleGenerator.createPrimaryData ( SDocument  document)
static

Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object.

TODO WHAT HAS BEEN SUPPOSED TO BE SHOWN HERE? THE ORIGINAL TEXT OR THE LINK TO THE STRING OBJECT?

Parameters
documentthe document, to which the created STextualDS object will be added FEHLT IN SAMPLE GENERATOR
Returns
returns the created primary text

◆ createPrimaryData() [2/2]

static STextualDS org.corpus_tools.salt.samples.SampleGenerator.createPrimaryData ( SDocument  document,
String  language 
)
static

Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object.

Parameters
documentthe document, to which the created STextualDS object will be added
languagethe language of the resource to be created, LANG_EN for english, LANG_DE for german
Returns
returns the created STextualDS object

◆ createSaltProject()

static SaltProject org.corpus_tools.salt.samples.SampleGenerator.createSaltProject ( )
static

Creates a complete SaltProject object having the complex structure.

           rootCorpus
          /         \
        subCorpus1          subCorpus2
        /      \            /       \
doc1   doc2         doc3     doc4
Returns

◆ createSyntaxAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createSyntaxAnnotations ( SDocument  document)
static

This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument).

Parameters
document

◆ createSyntaxStructure()

static void org.corpus_tools.salt.samples.SampleGenerator.createSyntaxStructure ( SDocument  document)
static

Creates a syntax structure for the given SDocument object.

If it does not already contain a primary text and a tokenization, this method calls createPrimaryData(SDocument) and createTokens(SDocument).

Parameters
document

◆ createToken()

static SToken org.corpus_tools.salt.samples.SampleGenerator.createToken ( int  start,
int  end,
STextualDS  textualDS,
SDocument  document,
SLayer  layer 
)
static

Creates a SToken covering the passed position and returns it.

Parameters
start
end
textualDS
document
layer
Returns
created SToken object

◆ createTokens() [1/2]

static void org.corpus_tools.salt.samples.SampleGenerator.createTokens ( SDocument  document)
static

Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens:

  • Is
  • this
  • example
  • more
  • complicated
  • than
  • it
  • appears
  • to
  • be
  • ?

The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.

Parameters
documentthe document, to which the created SToken objects will be added

◆ createTokens() [2/2]

static List<SToken> org.corpus_tools.salt.samples.SampleGenerator.createTokens ( SDocument  document,
STextualDS  textualDS 
)
static

Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens:

  • Is
  • this
  • example
  • more
  • complicated
  • than
  • it
  • appears
  • to
  • be
  • ?

or

  • Ist
  • dieses
  • Beispiel
  • komplizierter
  • als
  • es
  • zu
  • sein
  • scheint
  • ?

The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.

Parameters
documentthe document, to which the created SToken objects will be added
Returns
list of created SToken objects

◆ createUntypedParallelData()

static void org.corpus_tools.salt.samples.SampleGenerator.createUntypedParallelData ( SDocument  document)
static

Creates a small parallel corpus, containing an english and a german text.

The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders. TODO INSUFFICIENT DOCUMENTATION

Parameters
documenthe document containing the STextualDS objects