Pepper  3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter Class Reference

This is a PepperImporter which imports the SaltXML format into a salt model. More...

Inherits org.corpus_tools.pepper.impl.PepperImporterImpl, and org.corpus_tools.pepper.modules.PepperImporter.

Public Member Functions

Double isImportable (URI corpusPath)
 Reads recursively first found file and returns 1.0 if file contains: More...
 
SelfTestDesc getSelfTestDesc ()
 This method is called by the Pepper framework to run an integration test for module. More...
 
void importCorpusStructure (SCorpusGraph corpusGraph) throws PepperModuleException
 Imports the corpus-structure by a call of SaltProject#loadSCorpusStructure(URI).
 
PepperMapper createPepperMapper (Identifier id)
 Creates a mapper of type EXMARaLDA2SaltMapper. More...
 
- Public Member Functions inherited from org.corpus_tools.pepper.impl.PepperImporterImpl
List< FormatDescgetSupportedFormats ()
 {@inheritDoc PepperImporter::getSupportedFormats()}
 
FormatDesc addSupportedFormat (String formatName, String formatVersion, URI formatReference)
 {@inheritDoc PepperImporter::addSupportedFormat(String, String, URI)}
 
CorpusDesc getCorpusDesc ()
 {@inheritDoc PepperImporter::getCorpusDefinition()}
 
void setCorpusDesc (CorpusDesc newCorpusDefinition)
 {@inheritDoc PepperImporter::setCorpusDefinition(CorpusDefinition)}
 
synchronized Map< Identifier, URI > getIdentifier2ResourceTable ()
 {@inheritDoc PepperImporter::getIdentifier2ResourceTable()}
 
void start () throws PepperModuleException
 Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called. More...
 
synchronized Collection< String > getDocumentEndings ()
 {@inheritDoc PepperImporter::getDocumentEndings()}
 
synchronized Collection< String > getCorpusEndings ()
 {@inheritDoc PepperImporter::getCorpusEndings()}
 
SALT_TYPE setTypeOfResource (URI resource)
 {@inheritDoc PepperImporter::setTypeOfResource(URI)}
 
synchronized Collection< String > getIgnoreEndings ()
 Returns a collection of filenames, not to be imported. More...
 
void setCorpusPathResolver (CorpusPathResolver corpusPathResolver)
 Sets a CorpusPathResolver which is used by isImportable(URI). More...
 
- Public Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
PepperModuleDesc getFingerprint ()
 Returns a PepperModuleDesc object, which is a kind of a fingerprint of this PepperModule.This fingerprint for instance contains information like the name, the version of this module or information about the supplier.
Returns
fingerprint to this module

 
String getName ()
 Returns the name of this module.In most cases, the name somehow describes the task of the module.
Returns
the value of the 'Name' attribute.

 
String getVersion ()
 Returns the version of this module.
Returns
the value of the 'Version' attribute.

 
void setVersion (String newVersion)
 Sets the version of this module.The version normally is set internally, this method only exists for dependency injection, by the modules project itself. But this method is never called by the pepper framework.
Parameters
valuethe new value of the 'Version' attribute.

 
MODULE_TYPE getModuleType ()
 Returns the type of this module.
Returns
type of module
More...
 
String getDesc ()
 Returns a short description of this module.Please support some information, for the user, of what task this module does.
Returns
a short description of the task of this module

 
void setDesc (String desc)
 Sets a short description of this module.Please support some information, for the user, of what task this module does.
Parameters
desca short description of the task of this module

 
URI getSupplierContact ()
 Returns a uri where to find more information about this module and where to find some contact information to contact the supplier.
Returns
contact address like eMail address or homepage address

 
void setSupplierContact (URI supplierContact)
 Sets a uri where to find more information about this module and where to find some contact information to contact the supplier.
Parameters
uricontact address like eMail address or homepage address

 
URI getSupplierHomepage ()
 Sets the URI to the homepage describing the functionality of the module. More...
 
void setSupplierHomepage (URI hp)
 Returns the URI to the homepage describing the functionality of the module. More...
 
PepperModuleProperties getProperties ()
 Returns a PepperModuleProperties object containing properties to customize the behavior of this PepperModule.
Returns

 
void setProperties (PepperModuleProperties properties)
 Sets thePepperModuleProperties object containing properties to customize the behavior of this PepperModule.Please make sure, that this method is called in constructor of your module. If not, a general PepperModuleProperties object is created by the pepper framework and will be initialized. This means, when calling this method later, all properties for customizing the module will be overridden.
Parameters
properties

 
SaltProject getSaltProject ()
 Returns the SaltProject object, which is filled, manipulated or exported by the current module.
Returns
the value of the 'Salt Project' attribute.

 
synchronized void setSaltProject (SaltProject newSaltProject)
 Sets the SaltProject object, which is filled, manipulated or exported by the current module.Note: This method only should be called by the pepper framework.
Parameters
valuethe new value of the 'Salt Project' attribute.

 
URI getResources ()
 Returns the path of the folder which might contain resources for a Pepper module.This is the folder, which is delivered as part of the modules zip. Usually a Pepper module is a zip file containing a jar file and a folder having the same name as the jar file. In default configuration all files of folder "./src/main/resources" are copied to the resource folder.
Returns
path to resources

 
void setResources (URI newResources)
 Sets the resource folder used by getResources().This method should only be invoked by the Pepper framework. The documentation of getResources() for more details.
Parameters
valuepath to resource folder

 
URI getTemproraries ()
 TODO make docu.
 
void setTemproraries (URI newTemproraries)
 TODO make docu.
 
String getSymbolicName ()
 Returns the symbolic name of this OSGi bundle.
Returns
the value of the 'Symbolic Name' attribute.

 
void setSymbolicName (String newSymbolicName)
 Sets the symbolic name of this OSGi bundle.This value is set automatically inside the activate method, which is implemented in PepperModuleImpl class. If you want to manipulate that method. make sure to set the symbolic name and make sure, that it is set to the bundles symbolic name.
Parameters
valuethe new value of the 'Symbolic Name' attribute.

 
ComponentContext getComponentContext ()
 Returns the ComponentContext of the OSGi environment the bundle was started in. More...
 
Collection< String > getStartProblems ()
 If isReadyToStart() has returned false, this method returns a list of reasons why this module is not ready to start.
Returns
a list describing the reasons, or an empty list if there were no problems

 
boolean isReadyToStart () throws PepperModuleNotReadyException
 This method is called by the pepper framework after initializing this object and directly before start processing.Initializing means setting properties PepperModuleProperties, setting temporary files, resources etc. . returns false or throws an exception in case of PepperModule instance is not ready for any reason
This method is also called, when Pepper is in self-test mode, to check if module is correctly instantiated.
The default implementation checks:

  • if a path to resource folder is given
  • if the MODULE_TYPE is not null
  • if the name is not null

When overriding this method, please call super.isReadyToStart() first and in case a problem occured add it to the list getStartProblems().

Returns
false, PepperModule instance is not ready for any reason, true, else.

 
ModuleController getModuleController ()
 Returns the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework.
Returns
the value of the 'Pepper Module Controller' container reference.

 
void setPepperModuleController (ModuleController newModuleController)
 Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Also calls the inverse method ModuleController#setPepperModule_basic(PepperModule) . Note, this method only should be called by pepper framework.
Parameters
valuethe new value of the 'Pepper Module Controller' container reference.

 
void setPepperModuleController_basic (ModuleController newModuleController)
 Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Note, this method only should be called by pepper framework.
Parameters
valuethe new value of the 'Pepper Module Controller' container reference.

 
SCorpusGraph getCorpusGraph ()
 Returns the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject().
Returns
the value of the 'SCorpus Graph' attribute.

 
void setCorpusGraph (SCorpusGraph newSCorpusGraph)
 Sets the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject(). Note: This method only should be called by the pepper framework.
Parameters
valuethe new value of the 'SCorpus Graph' attribute.

 
void setIsMultithreaded (boolean isMultithreaded)
 Sets whether this PepperModule is able to run multithreaded.This method only should be called by the module itself.
Parameters
isThreadedtrue, if module can run in multithread mode.

 
boolean isMultithreaded ()
 Returns whether this PepperModule is able to run multithreaded.The behavior only should be set by the module itself via calling setIsMultithreaded(boolean).
Returns
true, if module can run in multithread mode.

 
void done (Identifier id, DOCUMENT_STATUS result)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping for this object is done.
Parameters
identifier
result

 
void done (PepperMapperController controller)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping is done.
Parameters
controllerThe object which is done with its job

 
void start (Identifier sElementId) throws PepperModuleException
 This method is called by method start(), if the method was not overridden by the current class. More...
 
void end () throws PepperModuleException
 Calls method start(Identifier) for every root SCorpus of SaltProject object.
 
void uncaughtException (Thread t, Throwable e)
 Method catches Uncaught exceptions thrown by PepperMapperImpl while running as Thread. More...
 
Double getProgress (String globalId)
 This method is invoked by the Pepper framework, to get the current progress concerning the SDocument object corresponding to the given Identifier in percent.A valid value return must be between 0 and 1.
Note: In case, you have overridden the method start(Identifier) or start(), please also override this method, because it accesses an internal list of all mappers, which initialized in start(Identifier).
Parameters
globalIDidentifier of the requested SDocument object, note, that this is not the Identifier.

 
Double getProgress ()
 {@inheritDoc PepperModule::getProgress()}
 
List< Identifier > proposeImportOrder (SCorpusGraph sCorpusGraph)
 This method could be overridden, to make a proposal for the import order of SDocument objects.Overriding this method is useful, in case of the order matters in the specific mapping of this PepperModule . In this case a influencing the import order can decrease the processing time. If you do not want to influence the order, just return an empty list, or don't override this method.
In case you want to override this method, you can return a value for each passed SCorpusGraph.
OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING.
Parameters
sCorpusGraphthe SCorpusGraph object for which the order could be proposed
Returns
a list determining the import order of SDocument objects

 
String toString ()
 Returns a string representation of this object. More...
 

Static Public Attributes

static final String MODULE_NAME = "SaltXMLImporter"
 
static final String FORMAT_NAME = "SaltXML"
 
static final String FORMAT_VERSION = "1.0"
 
- Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperModule
static final String ENDING_FOLDER = "FOLDER"
 A string specifying a value for a folder as ending. More...
 
static final String ENDING_LEAF_FOLDER = "LEAF_FOLDER"
 A string specifying a value for a leaf folder as ending. More...
 
static final String ENDING_XML = "xml"
 Ending for an xml file. More...
 
static final String ENDING_TXT = "txt"
 Ending for an txt file. More...
 
static final String ENDING_TAB = "tab"
 Ending for an tab file. More...
 
static final String ENDING_ALL_FILES = "ALL_FILES"
 All kinds of file endings.
 
- Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperImporter
static final String NEGATIVE_FILE_EXTENSION_MARKER = "-"
 A character or character sequence to mark a file extension as not to be one of the imported ones.
 

Additional Inherited Members

- Protected Member Functions inherited from org.corpus_tools.pepper.impl.PepperImporterImpl
 PepperImporterImpl ()
 Creates a PepperModule of type MODULE_TYPE#IMPORTER. More...
 
 PepperImporterImpl (String name)
 Creates a PepperModule of type MODULE_TYPE#IMPORTER and sets is name to the passed one.
 
Boolean importCorpusStructureRec (URI currURI, SCorpus parent)
 Top down traversal in file given structure. More...
 
void readXMLResource (DefaultHandler2 contentHandler, URI documentLocation)
 Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler. More...
 
Collection< String > sampleFileContent (final URI corpusPath, final String... fileEndings)
 Returns {@value IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path. More...
 
- Protected Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
 PepperModuleImpl ()
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More...
 
 PepperModuleImpl (String name)
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More...
 
void setName (String name)
 Sets the name of this PepperModule. More...
 
void activate (ComponentContext componentContext)
 This method is called by OSGi framework and sets the component context, this class is running in. More...
 
Map< String, PepperMapperControllergetMapperControllers ()
 Returns a threadsafe map of all PepperMapperController objects which are connected with a started PepperMapper corresponding to their. More...
 
ThreadGroup getMapperThreadGroup ()
 Returns a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More...
 
void setMapperThreadGroup (ThreadGroup mapperThreadGroup)
 Sets a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More...
 
Map< String, DocumentControllergetDocumentId2DC ()
 Returns the map relating Identifier belonging to SDocument objects to their DocumentController container. More...
 
- Protected Attributes inherited from org.corpus_tools.pepper.impl.PepperImporterImpl
CorpusDesc corpusDesc
 TODO make docu.
 
- Protected Attributes inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
Logger logger = LoggerFactory.getLogger("Pepper")
 
SaltProject saltProject = null
 Salt project which is processed by module.
 
URI resources = null
 TODO make docu.
 
URI temproraries = null
 TODO make docu.
 
String symbolicName = null
 TODO make docu.
 
ModuleController moduleController = null
 the controller object, which acts as bridge between Pepper framework and Pepper module.
 
SCorpusGraph sCorpusGraph = null
 The SCorpusGraph object which should be processed by this module.
 
boolean isMultithreaded = true
 

Detailed Description

This is a PepperImporter which imports the SaltXML format into a salt model.

This module assumes, that each document is stored in a separate file. Such a file must contain the document structure. The corpus structure is stored in a single file called saltProject + {@value SaltFactory::FILE_ENDING_SALT}. The value {@value SaltFactory::FILE_ENDING_SALT} can be got by method getSaltFileEnding().

Author
Florian Zipser
Version
1.0

Member Function Documentation

◆ createPepperMapper()

PepperMapper org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter.createPepperMapper ( Identifier  id)

◆ getSelfTestDesc()

SelfTestDesc org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter.getSelfTestDesc ( )

This method is called by the Pepper framework to run an integration test for module.

When the method returns null, it means that no integration test is supported. Otherwise, the SelfTestDesc object needs to provide an input corpus path and an output corpus path.

When this module is:

The simplest way to create a test description is:

return new IntegrationTestDesc(inputPath, outputPath);

When this module is an importer or a manipulator the method SelfTestDesc#compare(SaltProject, SaltProject) is called to compare output salt project with expected salt project. When the module is an exporter the method SelfTestDesc#compare(URI, URI) is called to compare the created output folder with an expected one. By default this method checks whether the file structure and each file is equal.

Returns
test description

Reimplemented from org.corpus_tools.pepper.impl.PepperModuleImpl.

◆ isImportable()

Double org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter.isImportable ( URI  corpusPath)

Reads recursively first found file and returns 1.0 if file contains:

  • <?xml
  • xmi:version="2.0"
  • salt

Reimplemented from org.corpus_tools.pepper.impl.PepperImporterImpl.