Pepper  3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
org.corpus_tools.pepper.modules.PepperExporter Interface Reference

Inherits org.corpus_tools.pepper.modules.PepperModule.

Inherited by org.corpus_tools.pepper.impl.PepperExporterImpl, org.corpus_tools.pepper.modules.coreModules.DoNothingExporter, org.corpus_tools.pepper.modules.coreModules.SaltXMLExporter, and org.corpus_tools.pepper.modules.coreModules.TextExporter.

Classes

enum  EXPORT_MODE
 Determines how the corpus-structure should be exported. More...
 

Public Member Functions

List< FormatDescgetSupportedFormats ()
 TODO docu. More...
 
CorpusDesc getCorpusDesc ()
 TODO docu. More...
 
String getDocumentEnding ()
 Returns the format ending for files to be exported and related to SDocument objects. More...
 
void setDocumentEnding (String sDocumentEnding)
 Sets the format ending for files to be exported and related to SDocument objects. More...
 
void setCorpusDesc (CorpusDesc corpusDesc)
 TODO docu. More...
 
Map< Identifier, URI > getIdentifier2ResourceTable ()
 Returns table correspondence between Identifier and a resource. More...
 
URI createFolderStructure (Identifier sElementId)
 Creates a folder structure basing on the passed corpus path in ( CorpusDesc#getCorpusPath()). More...
 
EXPORT_MODE getExportMode ()
 Returns how corpus-structure is exported. More...
 
void setExportMode (EXPORT_MODE exportMode)
 Determines how the corpus-structure should be exported. More...
 
void exportCorpusStructure ()
 This method is called by start() to export the corpus-structure into a folder-structure. More...
 
FormatDesc addSupportedFormat (String formatName, String formatVersion, URI formatReference)
 {@inheritDoc PepperModuleDesc::addSupportedFormat(String, String, URI)}
 
- Public Member Functions inherited from org.corpus_tools.pepper.modules.PepperModule
PepperModuleDesc getFingerprint ()
 Returns a PepperModuleDesc object, which is a kind of a fingerprint of this PepperModule. More...
 
MODULE_TYPE getModuleType ()
 Returns the type of this module. More...
 
ComponentContext getComponentContext ()
 Returns the ComponentContext of the OSGi environment the bundle was started in. More...
 
String getName ()
 Returns the name of this module. More...
 
String getVersion ()
 Returns the version of this module. More...
 
void setVersion (String value)
 Sets the version of this module. More...
 
String getDesc ()
 Returns a short description of this module. More...
 
void setDesc (String desc)
 Sets a short description of this module. More...
 
URI getSupplierContact ()
 Returns a uri where to find more information about this module and where to find some contact information to contact the supplier. More...
 
void setSupplierContact (URI eMail)
 Sets a uri where to find more information about this module and where to find some contact information to contact the supplier. More...
 
URI getSupplierHomepage ()
 Sets the URI to the homepage describing the functionality of the module. More...
 
void setSupplierHomepage (URI hp)
 Returns the URI to the homepage describing the functionality of the module. More...
 
PepperModuleProperties getProperties ()
 Returns a PepperModuleProperties object containing properties to customize the behavior of this PepperModule. More...
 
void setProperties (PepperModuleProperties properties)
 Sets thePepperModuleProperties object containing properties to customize the behavior of this PepperModule. More...
 
ModuleController getModuleController ()
 Returns the container and controller object for the current module. More...
 
void setPepperModuleController (ModuleController value)
 Sets the container and controller object for the current module. More...
 
void setPepperModuleController_basic (ModuleController value)
 Sets the container and controller object for the current module. More...
 
SaltProject getSaltProject ()
 Returns the SaltProject object, which is filled, manipulated or exported by the current module. More...
 
void setSaltProject (SaltProject value)
 Sets the SaltProject object, which is filled, manipulated or exported by the current module. More...
 
SCorpusGraph getCorpusGraph ()
 Returns the SCorpusGraph object which is filled, manipulated or exported by the current module. More...
 
void setCorpusGraph (SCorpusGraph value)
 Sets the SCorpusGraph object which is filled, manipulated or exported by the current module. More...
 
URI getResources ()
 Returns the path of the folder which might contain resources for a Pepper module. More...
 
void setResources (URI value)
 Sets the resource folder used by getResources(). More...
 
URI getTemproraries ()
 TODO make docu.
 
void setTemproraries (URI value)
 TODO make docu.
 
String getSymbolicName ()
 Returns the symbolic name of this OSGi bundle. More...
 
void setSymbolicName (String value)
 Sets the symbolic name of this OSGi bundle. More...
 
Collection< String > getStartProblems ()
 If isReadyToStart() has returned false, this method returns a list of reasons why this module is not ready to start. More...
 
boolean isReadyToStart () throws PepperModuleNotReadyException
 This method is called by the pepper framework after initializing this object and directly before start processing. More...
 
void setIsMultithreaded (boolean isMultithreaded)
 Sets whether this PepperModule is able to run multithreaded. More...
 
boolean isMultithreaded ()
 Returns whether this PepperModule is able to run multithreaded. More...
 
void start () throws PepperModuleException
 Starts the conversion process. More...
 
void start (Identifier sElementId) throws PepperModuleException
 This method is called by the method start(). More...
 
PepperMapper createPepperMapper (Identifier sElementId)
 OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING. More...
 
List< Identifier > proposeImportOrder (SCorpusGraph sCorpusGraph)
 This method could be overridden, to make a proposal for the import order of SDocument objects. More...
 
Double getProgress (String globalId)
 This method is invoked by the Pepper framework, to get the current progress concerning the SDocument object corresponding to the given Identifier in percent. More...
 
Double getProgress ()
 This method is invoked by the Pepper framework, to get the current total progress of all SDocument objects being processed by this module. More...
 
void end () throws PepperModuleException
 This method is called by the pepper framework at the end of a conversion process. More...
 
void done (PepperMapperController controller)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping is done. More...
 
void done (Identifier identifier, DOCUMENT_STATUS result)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping for this object is done. More...
 
SelfTestDesc getSelfTestDesc ()
 This method is called by the Pepper framework to run an integration test for module. More...
 

Additional Inherited Members

- Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperModule
static final String ENDING_FOLDER = "FOLDER"
 A string specifying a value for a folder as ending. More...
 
static final String ENDING_LEAF_FOLDER = "LEAF_FOLDER"
 A string specifying a value for a leaf folder as ending. More...
 
static final String ENDING_XML = "xml"
 Ending for an xml file. More...
 
static final String ENDING_TXT = "txt"
 Ending for an txt file. More...
 
static final String ENDING_TAB = "tab"
 Ending for an tab file. More...
 
static final String ENDING_ALL_FILES = "ALL_FILES"
 All kinds of file endings.
 

Detailed Description

A mapping task in the Pepper workflow is not a monolithic block. It consists of several smaller steps.

  • Declare the fingerprint of the module. This is part of the constructor.
  • Check readyness of the module.
  • Export the corpus structure.
  • Export the document structure and create a mapper for each corpus and document.
  • clean-up

The following describes the single steps in short. To get a more detailed explanation, take a look to the documentations found at http://u.hu-berlin.de/ saltnpepper.

Declare the fingerprint

Initialize the module and set the modules name, its description and the format description of data which are importable. This is part of the constructor:

public MyModule() {
        super("Name of the module");
        setSupplierContact(URI.createURI("Contact address of the module's supplier"));
        setSupplierHomepage(URI.createURI("homepage of the module"));
        setDesc("A short description of what is the intention of this module, for instance which formats are importable. ");
        this.addSupportedFormat("The name of a format which is importable e.g. txt",
                        "The version corresponding to the format name", null);
}

Check readyness of the module

This method is invoked by the Pepper framework before the mapping process is started. This method must return true, otherwise, this Pepper module could not be used in a Pepper workflow. At this point problems which prevent the module from being used you can report all problems to the user, for instance a database connection could not be established.

public boolean isReadyToStart() {
        return (true);
}

Export corpus structure

The corpus-structure export is handled in the method exportCorpusStructure(). It is invoked on top of the method ' start() ' of the PepperExporter . For totally changing the default behavior just override this method. The aim of the method exportCorpusStructure() is to fill the map of corresponding corpus-structure and file structure. The file structure is automatically created, there are just URI s pointing to the virtual file or folder. The creation of the file or folder has to be done by the Pepper module itself in method PepperMapper#mapSCorpus() or PepperMapper#mapSDocument(). To adapt the creation of this 'virtual' file structure, you first have to choose the mode of export. You can do this for instance in method 'readyToStart()', as shown in the following snippet. But even in the constructor as well.

public boolean isReadyToStart(){ 
        ... //option 1
        setExportMode(EXPORT_MODE.NO_EXPORT); 
        //option 2
        setExportMode(EXPORT_MODE.CORPORA_ONLY); 
        //option 3
        setExportMode(EXPORT_MODE.DOCUMENTS_IN_FILES);
 //sets the ending, which should be added to the documents name
        setDocumentEnding(ENDING_TAB); 
        .. 
}

In this snippet, option 1 means that nothing will be mapped. Option 2 means that only SCorpus objects are mapped to a folder and SDocument objects will be ignored. And option 3 means that SCorpus objects are mapped to a folder and SDocument objects are mapped to a file. The ending of that file can be determined by passing the ending with method setDocumentEnding(String). In the given snippet a URI having the ending 'tab' is created for each SDocument.

Export the document structure

In the method createPepperMapper(Identifier) a PepperMapper object needs to be initialized and returned. The PepperMapper is the major part major part doing the mapping. It provides the methods PepperMapper#mapSCorpus() to handle the mapping of a single SCorpus object and PepperMapper#mapSDocument() to handle a single SDocument object. Both methods are invoked by the Pepper framework. To set the PepperMapper#getResourceURI(), which offers the mapper the file or folder of the current SCorpus or SDocument object, this filed needs to be set in the createPepperMapper(Identifier) method. The following snippet shows a dummy of that method:

public PepperMapper createPepperMapper(Identifier sElementId) {
        PepperMapper mapper = new PepperMapperImpl() {
                &#064;Override
                public DOCUMENT_STATUS mapSCorpus() {
                        // handling the mapping of a single corpus
                        // accessing the current file or folder
                        getResourceURI();
                        // returning, that the corpus was mapped successfully
                        return (DOCUMENT_STATUS.COMPLETED);
                }
                &#064;Override
                public DOCUMENT_STATUS mapSDocument() {
                        // handling the mapping of a single document
                        // accessing the current file or folder
                        getResourceURI();
                        // returning, that the document was mapped successfully
                        return (DOCUMENT_STATUS.COMPLETED);
                }
        };
        // pass current file or folder to mapper. When using
        // PepperImporter.importCorpusStructure or
        // PepperExporter.exportCorpusStructure, the mapping between file or
        // folder
        // and SCorpus or SDocument was stored here
        mapper.setResourceURI(getIdentifier2ResourceTable().get(sElementId));
        return (mapper);
}

clean-up

Sometimes it might be necessary to clean up after the module did the job. For instance when writing an im- or an exporter it might be necessary to close file streams, a db connection etc. Therefore, after the processing is done, the Pepper framework calls the method described in the following snippet:

public void end() {
        super.end();
        // do some clean up like closing of streams etc.
}
Author
Florian Zipser

Class Documentation

◆ org::corpus_tools::pepper::modules::PepperExporter::EXPORT_MODE

enum org::corpus_tools::pepper::modules::PepperExporter::EXPORT_MODE

Determines how the corpus-structure should be exported.

Author
Florian Zipser
Enumerator
CORPORA_ONLY SCorpus objects are exported into a folder structure, but SDocument objects are not exported
DOCUMENTS_IN_FILES SCorpus objects are exported into a folder structure and SDocument objects are stored in files having the ending determined by PepperExporter::getDocumentEnding()
NO_EXPORT corpus-structure should not be exported

Member Function Documentation

◆ createFolderStructure()

URI org.corpus_tools.pepper.modules.PepperExporter.createFolderStructure ( Identifier  sElementId)

Creates a folder structure basing on the passed corpus path in ( CorpusDesc#getCorpusPath()).

For each segment in Identifier a folder is created.

Returns
the entire path of Identifier as file path, which was created on disk

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ exportCorpusStructure()

void org.corpus_tools.pepper.modules.PepperExporter.exportCorpusStructure ( )

This method is called by start() to export the corpus-structure into a folder-structure.

That means, each Identifier belonging to a SDocument or SCorpus object is stored getIdentifier2ResourceTable() together with thze corresponding file-structure object (file or folder) located by a URI. The URI object corresponding to files will get the file ending determined by getDocumentEnding(String), which could be set by setDocumentEnding(String).
To adapt the creation of URIs set the export mode via setExportMode(EXPORT_MODE).

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ getCorpusDesc()

CorpusDesc org.corpus_tools.pepper.modules.PepperExporter.getCorpusDesc ( )

TODO docu.

Returns

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ getDocumentEnding()

String org.corpus_tools.pepper.modules.PepperExporter.getDocumentEnding ( )

Returns the format ending for files to be exported and related to SDocument objects.

Returns
file ending for SDocument objects to be exported.

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ getExportMode()

EXPORT_MODE org.corpus_tools.pepper.modules.PepperExporter.getExportMode ( )

Returns how corpus-structure is exported.

Returns

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ getIdentifier2ResourceTable()

Map<Identifier, URI> org.corpus_tools.pepper.modules.PepperExporter.getIdentifier2ResourceTable ( )

Returns table correspondence between Identifier and a resource.

Stores Identifier objects corresponding to either a SDocument or a SCorpus object, which has been created during the run of importCorpusStructure(SCorpusGraph). Corresponding to the Identifier object this table stores the resource from where the element shall be imported.
For instance:

corpus_1 /home/me/corpora/myCorpus
corpus_2 /home/me/corpora/myCorpus/subcorpus
doc_1 /home/me/corpora/myCorpus/subcorpus/document1.xml
doc_2 /home/me/corpora/myCorpus/subcorpus/document2.xml
Returns
table correspondence between Identifier and a resource.

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ getSupportedFormats()

List<FormatDesc> org.corpus_tools.pepper.modules.PepperExporter.getSupportedFormats ( )

TODO docu.

Returns

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ setCorpusDesc()

void org.corpus_tools.pepper.modules.PepperExporter.setCorpusDesc ( CorpusDesc  corpusDesc)

TODO docu.

Returns

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ setDocumentEnding()

void org.corpus_tools.pepper.modules.PepperExporter.setDocumentEnding ( String  sDocumentEnding)

Sets the format ending for files to be exported and related to SDocument objects.

Parameters
fileending for SDocument objects to be exported.

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.

◆ setExportMode()

void org.corpus_tools.pepper.modules.PepperExporter.setExportMode ( EXPORT_MODE  exportMode)

Determines how the corpus-structure should be exported.

Parameters
exportMode

Implemented in org.corpus_tools.pepper.impl.PepperExporterImpl.