Creates samples of SDocumentGraph and SCorpusGraph instances. More...

Static Public Member Functions
static SCorpusGraph	createCorpusStructure (SaltProject saltProject)
	Creates the following corpus structure and adds it to the given salt project. More...

static SaltProject	createSaltProject ()
	Creates a complete SaltProject object having the complex structure. More...

static SCorpusGraph	createCorpusStructure ()
	Creates the following structure: More...

static SCorpusGraph	createCorpusStructure (SCorpusGraph corpGraph1)
	Creates the following structure: More...

static SCorpusGraph	createCorpusStructure_simple ()
	Creates the following structure: More...

static void	createDialogue (SDocument document)
	Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects. More...

static STextualDS	createPrimaryData (SDocument document)
	Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object. More...

static STextualDS	createPrimaryData (SDocument document, String language)
	Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object. More...

static void	createTokens (SDocument document)
	Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens: More...

static List< SToken >	createTokens (SDocument document, STextualDS textualDS)
	Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens: More...

static SToken	createToken (int start, int end, STextualDS textualDS, SDocument document, SLayer layer)
	Creates a SToken covering the passed position and returns it. More...

static void	createParallelData (SDocument document)

static void	createParallelData (SDocument document, boolean setTypeForPointRel)
	Creates a small parallel corpus, containing an english and a german text. More...

static void	createUntypedParallelData (SDocument document)
	Creates a small parallel corpus, containing an english and a german text. More...

static void	createMorphologyAnnotations (SDocument document)
	Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object. More...

static void	createInformationStructureSpan (SDocument document)
	Creates SSpan object above the tokenization. More...

static void	createInformationStructureAnnotations (SDocument document)
	Annotates the SSpan objects above the tokenization with information structural annotations. More...

static void	createSyntaxStructure (SDocument document)
	Creates a syntax structure for the given SDocument object. More...

static void	createSyntaxAnnotations (SDocument document)
	This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument). More...

static void	createDependencies (SDocument document)
	This method creates the sample's dependency annotation. More...

static void	createAnaphoricAnnotations (SDocument document)

static void	createDocumentStructure (SDocument document)
	Creates a document structure containing: More...

Static Public Attributes
static final String	PRIMARY_TEXT_EN = "Is this example more complicated than it appears to be?"
	The primary text, which is used for the samples.

static final String	PRIMARY_TEXT_EN_SPK1 = PRIMARY_TEXT_EN
	Primary text of speaker1.

static final String	PRIMARY_TEXT_EN_SPK2 = "Uhm oh yes!"
	Primary text of speaker2.

static final String	PRIMARY_TEXT_DE = "Ist dieses Beispiel komplizierter als es zu sein scheint?"
	The primary text, which is used for the samples.

static final String	MORPHOLOGY_LAYER = "morphology"
	The name of the morphologic layer containing the tokens.

static final String	LANG_EN = "en"
	iso 639-1 language code for english

static final String	LANG_DE = "de"
	iso 639-1 language code for german

static final String	SYNTAX_LAYER = "syntax"

Detailed Description

Creates samples of SDocumentGraph and SCorpusGraph instances.

Author: Florian Zipser

Member Function Documentation

◆ createAnaphoricAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createAnaphoricAnnotations ( SDocument document )

static

Parameters

document

◆ createCorpusStructure() [1/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( )

static

Creates the following structure:

           rootCorpus
      /                    \
 subCorpus1              subCorpus2
 /       \              /        \
doc1     doc2         doc3      doc4

Exceptions

IOException
SAXException

◆ createCorpusStructure() [2/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( SaltProject saltProject )

static

Creates the following corpus structure and adds it to the given salt project.

           rootCorpus
          /         \
        subCorpus1           subCorpus2
        /      \            /       \
doc1   doc2         doc3     doc4

Exceptions

IOException
SAXException

◆ createCorpusStructure() [3/3]

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure ( SCorpusGraph corpGraph1 )

static

Creates the following structure:

            rootCorpus
      /                     \
 subCorpus1             subCorpus2
 /       \             /         \
doc1    doc2         doc3       doc4

Exceptions

IOException
SAXException

◆ createCorpusStructure_simple()

static SCorpusGraph org.corpus_tools.salt.samples.SampleGenerator.createCorpusStructure_simple ( )

static

Creates the following structure:

rootCorpus | doc1

Exceptions

IOException
SAXException

◆ createDependencies()

static void org.corpus_tools.salt.samples.SampleGenerator.createDependencies ( SDocument document )

static

This method creates the sample's dependency annotation.

Parameters

document

◆ createDialogue()

static void org.corpus_tools.salt.samples.SampleGenerator.createDialogue ( SDocument document )

static

Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects.

The texts are {@value PRIMARY_TEXT_EN_SPK1} and {@value PRIMARY_TEXT_EN_SPK2}, which are tokenized by words. The words 'to' and 'Oh' have been said simultaneously and are overlapping via the timeline.

Parameters

document document to be filled

◆ createDocumentStructure()

static void org.corpus_tools.salt.samples.SampleGenerator.createDocumentStructure ( SDocument document )

static

Creates a document structure containing:

primary text
tokenization
morphological annotations
information structure annotation
syntactical annotation
anaphoric annotation

Parameters

document

◆ createInformationStructureAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createInformationStructureAnnotations ( SDocument document )

static

Annotates the SSpan objects above the tokenization with information structural annotations.

Parameters

document

◆ createInformationStructureSpan()

static void org.corpus_tools.salt.samples.SampleGenerator.createInformationStructureSpan ( SDocument document )

static

Creates SSpan object above the tokenization.

contrast-focus	topic
Is	this	example	more	complicated	than	it	appears	to	be

Parameters

document

◆ createMorphologyAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createMorphologyAnnotations ( SDocument document )

static

Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object.

token	pos	lemma
Is	VBZ	be
this	DT	this
example	NN	example
more	ABR	more
complicated	JJ	complicated
than	IN	than
it	PRP	it
appears	VBZ	appear
to	TO	to
be	VB	be

Parameters

document the document containing the SToken and STextualDS objects

◆ createParallelData()

static void org.corpus_tools.salt.samples.SampleGenerator.createParallelData	(	SDocument	document,
		boolean	setTypeForPointRel
	)

static

Creates a small parallel corpus, containing an english and a german text.

The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders.

Parameters

document he document containing the STextualDS objects

◆ createPrimaryData() [1/2]

static STextualDS org.corpus_tools.salt.samples.SampleGenerator.createPrimaryData ( SDocument document )

static

Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object.

TODO WHAT HAS BEEN SUPPOSED TO BE SHOWN HERE? THE ORIGINAL TEXT OR THE LINK TO THE STRING OBJECT?

Parameters

document the document, to which the created STextualDS object will be added FEHLT IN SAMPLE GENERATOR

Returns: returns the created primary text

◆ createPrimaryData() [2/2]

static STextualDS org.corpus_tools.salt.samples.SampleGenerator.createPrimaryData	(	SDocument	document,
		String	language
	)

static

Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object.

Parameters

document	the document, to which the created STextualDS object will be added
language	the language of the resource to be created, LANG_EN for english, LANG_DE for german

Returns: returns the created STextualDS object

◆ createSaltProject()

static SaltProject org.corpus_tools.salt.samples.SampleGenerator.createSaltProject ( )

static

Creates a complete SaltProject object having the complex structure.

           rootCorpus
          /         \
        subCorpus1          subCorpus2
        /      \            /       \
doc1   doc2         doc3     doc4

Returns

◆ createSyntaxAnnotations()

static void org.corpus_tools.salt.samples.SampleGenerator.createSyntaxAnnotations ( SDocument document )

static

This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument).

Parameters

document

◆ createSyntaxStructure()

static void org.corpus_tools.salt.samples.SampleGenerator.createSyntaxStructure ( SDocument document )

static

Creates a syntax structure for the given SDocument object.

If it does not already contain a primary text and a tokenization, this method calls createPrimaryData(SDocument) and createTokens(SDocument).

Parameters

document

◆ createToken()

static SToken org.corpus_tools.salt.samples.SampleGenerator.createToken	(	int	start,
		int	end,
		STextualDS	textualDS,
		SDocument	document,
		SLayer	layer
	)

static

Creates a SToken covering the passed position and returns it.

Parameters

start
end
textualDS
document
layer

Returns: created SToken object

◆ createTokens() [1/2]

static void org.corpus_tools.salt.samples.SampleGenerator.createTokens ( SDocument document )

static

Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens:

Is
this
example
more
complicated
than
it
appears
to
be
?

The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.

Parameters

document the document, to which the created SToken objects will be added

◆ createTokens() [2/2]

static List<SToken> org.corpus_tools.salt.samples.SampleGenerator.createTokens	(	SDocument	document,
		STextualDS	textualDS
	)

static

Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens:

Is
this
example
more
complicated
than
it
appears
to
be
?

or

Ist
dieses
Beispiel
komplizierter
als
es
zu
sein
scheint
?

The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.

Parameters

document the document, to which the created SToken objects will be added

Returns: list of created SToken objects

◆ createUntypedParallelData()

static void org.corpus_tools.salt.samples.SampleGenerator.createUntypedParallelData ( SDocument document )

static

Creates a small parallel corpus, containing an english and a german text.

The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders. TODO INSUFFICIENT DOCUMENTATION

Parameters

document he document containing the STextualDS objects

Static Public Member Functions

Static Public Attributes

Detailed Description

Member Function Documentation

◆ createAnaphoricAnnotations()

◆ createCorpusStructure() [1/3]

◆ createCorpusStructure() [2/3]

◆ createCorpusStructure() [3/3]

◆ createCorpusStructure_simple()

◆ createDependencies()

◆ createDialogue()

◆ createDocumentStructure()

◆ createInformationStructureAnnotations()

◆ createInformationStructureSpan()

◆ createMorphologyAnnotations()

◆ createParallelData()

◆ createPrimaryData() [1/2]

◆ createPrimaryData() [2/2]

◆ createSaltProject()

◆ createSyntaxAnnotations()

◆ createSyntaxStructure()

◆ createToken()

◆ createTokens() [1/2]

◆ createTokens() [2/2]

◆ createUntypedParallelData()