Salt
3.4.2
A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of linguistic data .
|
Creates samples of SDocumentGraph and SCorpusGraph instances. More...
Static Public Member Functions | |
static SCorpusGraph | createCorpusStructure (SaltProject saltProject) |
Creates the following corpus structure and adds it to the given salt project. More... | |
static SaltProject | createSaltProject () |
Creates a complete SaltProject object having the complex structure. More... | |
static SCorpusGraph | createCorpusStructure () |
Creates the following structure: More... | |
static SCorpusGraph | createCorpusStructure (SCorpusGraph corpGraph1) |
Creates the following structure: More... | |
static SCorpusGraph | createCorpusStructure_simple () |
Creates the following structure: More... | |
static void | createDialogue (SDocument document) |
Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects. More... | |
static STextualDS | createPrimaryData (SDocument document) |
Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object. More... | |
static STextualDS | createPrimaryData (SDocument document, String language) |
Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object. More... | |
static void | createTokens (SDocument document) |
Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens: More... | |
static List< SToken > | createTokens (SDocument document, STextualDS textualDS) |
Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens: More... | |
static SToken | createToken (int start, int end, STextualDS textualDS, SDocument document, SLayer layer) |
Creates a SToken covering the passed position and returns it. More... | |
static void | createParallelData (SDocument document) |
static void | createParallelData (SDocument document, boolean setTypeForPointRel) |
Creates a small parallel corpus, containing an english and a german text. More... | |
static void | createUntypedParallelData (SDocument document) |
Creates a small parallel corpus, containing an english and a german text. More... | |
static void | createMorphologyAnnotations (SDocument document) |
Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object. More... | |
static void | createInformationStructureSpan (SDocument document) |
Creates SSpan object above the tokenization. More... | |
static void | createInformationStructureAnnotations (SDocument document) |
Annotates the SSpan objects above the tokenization with information structural annotations. More... | |
static void | createSyntaxStructure (SDocument document) |
Creates a syntax structure for the given SDocument object. More... | |
static void | createSyntaxAnnotations (SDocument document) |
This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument). More... | |
static void | createDependencies (SDocument document) |
This method creates the sample's dependency annotation. More... | |
static void | createAnaphoricAnnotations (SDocument document) |
static void | createDocumentStructure (SDocument document) |
Creates a document structure containing: More... | |
Static Public Attributes | |
static final String | PRIMARY_TEXT_EN = "Is this example more complicated than it appears to be?" |
The primary text, which is used for the samples. | |
static final String | PRIMARY_TEXT_EN_SPK1 = PRIMARY_TEXT_EN |
Primary text of speaker1. | |
static final String | PRIMARY_TEXT_EN_SPK2 = "Uhm oh yes!" |
Primary text of speaker2. | |
static final String | PRIMARY_TEXT_DE = "Ist dieses Beispiel komplizierter als es zu sein scheint?" |
The primary text, which is used for the samples. | |
static final String | MORPHOLOGY_LAYER = "morphology" |
The name of the morphologic layer containing the tokens. | |
static final String | LANG_EN = "en" |
iso 639-1 language code for english | |
static final String | LANG_DE = "de" |
iso 639-1 language code for german | |
static final String | SYNTAX_LAYER = "syntax" |
Creates samples of SDocumentGraph and SCorpusGraph instances.
|
static |
document |
|
static |
Creates the following structure:
rootCorpus / \ subCorpus1 subCorpus2 / \ / \ doc1 doc2 doc3 doc4
IOException | |
SAXException |
|
static |
Creates the following corpus structure and adds it to the given salt project.
rootCorpus / \ subCorpus1 subCorpus2 / \ / \ doc1 doc2 doc3 doc4
IOException | |
SAXException |
|
static |
Creates the following structure:
rootCorpus / \ subCorpus1 subCorpus2 / \ / \ doc1 doc2 doc3 doc4
IOException | |
SAXException |
|
static |
Creates the following structure:
rootCorpus | doc1
IOException | |
SAXException |
|
static |
This method creates the sample's dependency annotation.
document |
|
static |
Creates a SDocumentGraph containing to texts of two different speakers, who are aligned via the STimeline related to the SToken objects.
The texts are {@value PRIMARY_TEXT_EN_SPK1} and {@value PRIMARY_TEXT_EN_SPK2}, which are tokenized by words. The words 'to' and 'Oh' have been said simultaneously and are overlapping via the timeline.
document | document to be filled |
|
static |
Creates a document structure containing:
document |
|
static |
Annotates the SSpan objects above the tokenization with information structural annotations.
document |
|
static |
Creates SSpan object above the tokenization.
contrast-focus | topic | ||||||||
Is | this | example | more | complicated | than | it | appears | to | be |
document |
|
static |
Creates morphological annotations (pos and lemma) for the tokenized sample and adds them to each SToken object as SPOSAnnotation or SLemmaAnnotation object.
token | pos | lemma |
Is | VBZ | be |
this | DT | this |
example | NN | example |
more | ABR | more |
complicated | JJ | complicated |
than | IN | than |
it | PRP | it |
appears | VBZ | appear |
to | TO | to |
be | VB | be |
document | the document containing the SToken and STextualDS objects |
|
static |
Creates a small parallel corpus, containing an english and a german text.
The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders.
document | he document containing the STextualDS objects |
|
static |
Creates an STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN and adds the object to the SDocumentGraph being contained by the given SDocument object.
TODO WHAT HAS BEEN SUPPOSED TO BE SHOWN HERE? THE ORIGINAL TEXT OR THE LINK TO THE STRING OBJECT?
document | the document, to which the created STextualDS object will be added FEHLT IN SAMPLE GENERATOR |
|
static |
Creates a STextualDS object containing the primary text SampleGenerator#PRIMARY_TEXT_EN, which is either an english text or its german translation and adds the object to the SDocumentGraph being contained by the given SDocument object.
document | the document, to which the created STextualDS object will be added |
language | the language of the resource to be created, LANG_EN for english, LANG_DE for german |
|
static |
Creates a complete SaltProject object having the complex structure.
rootCorpus / \ subCorpus1 subCorpus2 / \ / \ doc1 doc2 doc3 doc4
|
static |
This method creates the categorical annotations for the nodes of the sample syntax tree created in SampleGenerator#createSyntaxStructure(SDocument).
document |
|
static |
Creates a syntax structure for the given SDocument object.
If it does not already contain a primary text and a tokenization, this method calls createPrimaryData(SDocument) and createTokens(SDocument).
document |
|
static |
|
static |
Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN in to the following tokens:
The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.
document | the document, to which the created SToken objects will be added |
|
static |
Creates a set of SToken objects tokenizing the primary text SampleGenerator#PRIMARY_TEXT_EN or SampleGenerator#PRIMARY_TEXT_DE depending on the given STextualDS object in to the following tokens:
or
The created SToken objects and corresponding STextualRelation objects are added to the given SDocument object.
document | the document, to which the created SToken objects will be added |
|
static |
Creates a small parallel corpus, containing an english and a german text.
The english text is {@value PRIMARY_TEXT_EN}, the german text is {@value PRIMARY_TEXT_DE}. Both are tokenized by word borders. TODO INSUFFICIENT DOCUMENTATION
document | he document containing the STextualDS objects |