Pepper  3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
org.corpus_tools.pepper.impl.PepperImporterImpl Class Referenceabstract

Inherits org.corpus_tools.pepper.impl.PepperModuleImpl, and org.corpus_tools.pepper.modules.PepperImporter.

Inherited by org.corpus_tools.pepper.modules.coreModules.DoNothingImporter, org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter, and org.corpus_tools.pepper.modules.coreModules.TextImporter.

Public Member Functions

List< FormatDescgetSupportedFormats ()
 {@inheritDoc PepperImporter::getSupportedFormats()}
 
FormatDesc addSupportedFormat (String formatName, String formatVersion, URI formatReference)
 {@inheritDoc PepperImporter::addSupportedFormat(String, String, URI)}
 
CorpusDesc getCorpusDesc ()
 {@inheritDoc PepperImporter::getCorpusDefinition()}
 
void setCorpusDesc (CorpusDesc newCorpusDefinition)
 {@inheritDoc PepperImporter::setCorpusDefinition(CorpusDefinition)}
 
synchronized Map< Identifier, URI > getIdentifier2ResourceTable ()
 {@inheritDoc PepperImporter::getIdentifier2ResourceTable()}
 
void importCorpusStructure (SCorpusGraph corpusGraph) throws PepperModuleException
 {@inheritDoc PepperImporter::importCorpusStructure(SCorpusGraph)}
 
void start () throws PepperModuleException
 Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called. More...
 
synchronized Collection< String > getDocumentEndings ()
 {@inheritDoc PepperImporter::getDocumentEndings()}
 
synchronized Collection< String > getCorpusEndings ()
 {@inheritDoc PepperImporter::getCorpusEndings()}
 
SALT_TYPE setTypeOfResource (URI resource)
 {@inheritDoc PepperImporter::setTypeOfResource(URI)}
 
synchronized Collection< String > getIgnoreEndings ()
 Returns a collection of filenames, not to be imported. More...
 
Double isImportable (URI corpusPath)
 {@inheritDoc PepperImporter::isImportable(URI)}
 
void setCorpusPathResolver (CorpusPathResolver corpusPathResolver)
 Sets a CorpusPathResolver which is used by isImportable(URI). More...
 
- Public Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
PepperModuleDesc getFingerprint ()
 Returns a PepperModuleDesc object, which is a kind of a fingerprint of this PepperModule.This fingerprint for instance contains information like the name, the version of this module or information about the supplier.
Returns
fingerprint to this module

 
String getName ()
 Returns the name of this module.In most cases, the name somehow describes the task of the module.
Returns
the value of the 'Name' attribute.

 
String getVersion ()
 Returns the version of this module.
Returns
the value of the 'Version' attribute.

 
void setVersion (String newVersion)
 Sets the version of this module.The version normally is set internally, this method only exists for dependency injection, by the modules project itself. But this method is never called by the pepper framework.
Parameters
valuethe new value of the 'Version' attribute.

 
MODULE_TYPE getModuleType ()
 Returns the type of this module.
Returns
type of module
More...
 
String getDesc ()
 Returns a short description of this module.Please support some information, for the user, of what task this module does.
Returns
a short description of the task of this module

 
void setDesc (String desc)
 Sets a short description of this module.Please support some information, for the user, of what task this module does.
Parameters
desca short description of the task of this module

 
URI getSupplierContact ()
 Returns a uri where to find more information about this module and where to find some contact information to contact the supplier.
Returns
contact address like eMail address or homepage address

 
void setSupplierContact (URI supplierContact)
 Sets a uri where to find more information about this module and where to find some contact information to contact the supplier.
Parameters
uricontact address like eMail address or homepage address

 
URI getSupplierHomepage ()
 Sets the URI to the homepage describing the functionality of the module. More...
 
void setSupplierHomepage (URI hp)
 Returns the URI to the homepage describing the functionality of the module. More...
 
PepperModuleProperties getProperties ()
 Returns a PepperModuleProperties object containing properties to customize the behavior of this PepperModule.
Returns

 
void setProperties (PepperModuleProperties properties)
 Sets thePepperModuleProperties object containing properties to customize the behavior of this PepperModule.Please make sure, that this method is called in constructor of your module. If not, a general PepperModuleProperties object is created by the pepper framework and will be initialized. This means, when calling this method later, all properties for customizing the module will be overridden.
Parameters
properties

 
SaltProject getSaltProject ()
 Returns the SaltProject object, which is filled, manipulated or exported by the current module.
Returns
the value of the 'Salt Project' attribute.

 
synchronized void setSaltProject (SaltProject newSaltProject)
 Sets the SaltProject object, which is filled, manipulated or exported by the current module.Note: This method only should be called by the pepper framework.
Parameters
valuethe new value of the 'Salt Project' attribute.

 
URI getResources ()
 Returns the path of the folder which might contain resources for a Pepper module.This is the folder, which is delivered as part of the modules zip. Usually a Pepper module is a zip file containing a jar file and a folder having the same name as the jar file. In default configuration all files of folder "./src/main/resources" are copied to the resource folder.
Returns
path to resources

 
void setResources (URI newResources)
 Sets the resource folder used by getResources().This method should only be invoked by the Pepper framework. The documentation of getResources() for more details.
Parameters
valuepath to resource folder

 
URI getTemproraries ()
 TODO make docu.
 
void setTemproraries (URI newTemproraries)
 TODO make docu.
 
String getSymbolicName ()
 Returns the symbolic name of this OSGi bundle.
Returns
the value of the 'Symbolic Name' attribute.

 
void setSymbolicName (String newSymbolicName)
 Sets the symbolic name of this OSGi bundle.This value is set automatically inside the activate method, which is implemented in PepperModuleImpl class. If you want to manipulate that method. make sure to set the symbolic name and make sure, that it is set to the bundles symbolic name.
Parameters
valuethe new value of the 'Symbolic Name' attribute.

 
ComponentContext getComponentContext ()
 Returns the ComponentContext of the OSGi environment the bundle was started in. More...
 
Collection< String > getStartProblems ()
 If isReadyToStart() has returned false, this method returns a list of reasons why this module is not ready to start.
Returns
a list describing the reasons, or an empty list if there were no problems

 
boolean isReadyToStart () throws PepperModuleNotReadyException
 This method is called by the pepper framework after initializing this object and directly before start processing.Initializing means setting properties PepperModuleProperties, setting temporary files, resources etc. . returns false or throws an exception in case of PepperModule instance is not ready for any reason
This method is also called, when Pepper is in self-test mode, to check if module is correctly instantiated.
The default implementation checks:

  • if a path to resource folder is given
  • if the MODULE_TYPE is not null
  • if the name is not null

When overriding this method, please call super.isReadyToStart() first and in case a problem occured add it to the list getStartProblems().

Returns
false, PepperModule instance is not ready for any reason, true, else.

 
ModuleController getModuleController ()
 Returns the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework.
Returns
the value of the 'Pepper Module Controller' container reference.

 
void setPepperModuleController (ModuleController newModuleController)
 Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Also calls the inverse method ModuleController#setPepperModule_basic(PepperModule) . Note, this method only should be called by pepper framework.
Parameters
valuethe new value of the 'Pepper Module Controller' container reference.

 
void setPepperModuleController_basic (ModuleController newModuleController)
 Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Note, this method only should be called by pepper framework.
Parameters
valuethe new value of the 'Pepper Module Controller' container reference.

 
SCorpusGraph getCorpusGraph ()
 Returns the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject().
Returns
the value of the 'SCorpus Graph' attribute.

 
void setCorpusGraph (SCorpusGraph newSCorpusGraph)
 Sets the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject(). Note: This method only should be called by the pepper framework.
Parameters
valuethe new value of the 'SCorpus Graph' attribute.

 
void setIsMultithreaded (boolean isMultithreaded)
 Sets whether this PepperModule is able to run multithreaded.This method only should be called by the module itself.
Parameters
isThreadedtrue, if module can run in multithread mode.

 
boolean isMultithreaded ()
 Returns whether this PepperModule is able to run multithreaded.The behavior only should be set by the module itself via calling setIsMultithreaded(boolean).
Returns
true, if module can run in multithread mode.

 
void done (Identifier id, DOCUMENT_STATUS result)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping for this object is done.
Parameters
identifier
result

 
void done (PepperMapperController controller)
 This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping is done.
Parameters
controllerThe object which is done with its job

 
void start (Identifier sElementId) throws PepperModuleException
 This method is called by method start(), if the method was not overridden by the current class. More...
 
PepperMapper createPepperMapper (Identifier sElementId)
 OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING.This method creates a customized PepperMapper object and returns it. You can here do some additional initialisations. Thinks like setting the Identifier of the SDocument or SCorpus object and the URI resource is done by the framework (or more in detail in method start()). The parameter sElementId, if a PepperMapper object should be created in case of the object to map is either an SDocument object or an SCorpus object of the mapper should be initialized differently.
Note: Override this method.
Parameters
sElementIdIdentifier of the SCorpus or SDocument to be processed.
Returns
PepperMapper object to do the mapping task for object connected to given Identifier

 
void end () throws PepperModuleException
 Calls method start(Identifier) for every root SCorpus of SaltProject object.
 
void uncaughtException (Thread t, Throwable e)
 Method catches Uncaught exceptions thrown by PepperMapperImpl while running as Thread. More...
 
Double getProgress (String globalId)
 This method is invoked by the Pepper framework, to get the current progress concerning the SDocument object corresponding to the given Identifier in percent.A valid value return must be between 0 and 1.
Note: In case, you have overridden the method start(Identifier) or start(), please also override this method, because it accesses an internal list of all mappers, which initialized in start(Identifier).
Parameters
globalIDidentifier of the requested SDocument object, note, that this is not the Identifier.

 
Double getProgress ()
 {@inheritDoc PepperModule::getProgress()}
 
List< Identifier > proposeImportOrder (SCorpusGraph sCorpusGraph)
 This method could be overridden, to make a proposal for the import order of SDocument objects.Overriding this method is useful, in case of the order matters in the specific mapping of this PepperModule . In this case a influencing the import order can decrease the processing time. If you do not want to influence the order, just return an empty list, or don't override this method.
In case you want to override this method, you can return a value for each passed SCorpusGraph.
OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING.
Parameters
sCorpusGraphthe SCorpusGraph object for which the order could be proposed
Returns
a list determining the import order of SDocument objects

 
String toString ()
 Returns a string representation of this object. More...
 
SelfTestDesc getSelfTestDesc ()
 This method is called by the Pepper framework to run an integration test for module. More...
 

Protected Member Functions

 PepperImporterImpl ()
 Creates a PepperModule of type MODULE_TYPE#IMPORTER. More...
 
 PepperImporterImpl (String name)
 Creates a PepperModule of type MODULE_TYPE#IMPORTER and sets is name to the passed one.
 
Boolean importCorpusStructureRec (URI currURI, SCorpus parent)
 Top down traversal in file given structure. More...
 
void readXMLResource (DefaultHandler2 contentHandler, URI documentLocation)
 Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler. More...
 
Collection< String > sampleFileContent (final URI corpusPath, final String... fileEndings)
 Returns {@value IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path. More...
 
- Protected Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
 PepperModuleImpl ()
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More...
 
 PepperModuleImpl (String name)
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More...
 
void setName (String name)
 Sets the name of this PepperModule. More...
 
void activate (ComponentContext componentContext)
 This method is called by OSGi framework and sets the component context, this class is running in. More...
 
Map< String, PepperMapperControllergetMapperControllers ()
 Returns a threadsafe map of all PepperMapperController objects which are connected with a started PepperMapper corresponding to their. More...
 
ThreadGroup getMapperThreadGroup ()
 Returns a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More...
 
void setMapperThreadGroup (ThreadGroup mapperThreadGroup)
 Sets a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More...
 
Map< String, DocumentControllergetDocumentId2DC ()
 Returns the map relating Identifier belonging to SDocument objects to their DocumentController container. More...
 

Protected Attributes

CorpusDesc corpusDesc
 TODO make docu.
 
- Protected Attributes inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
Logger logger = LoggerFactory.getLogger("Pepper")
 
SaltProject saltProject = null
 Salt project which is processed by module.
 
URI resources = null
 TODO make docu.
 
URI temproraries = null
 TODO make docu.
 
String symbolicName = null
 TODO make docu.
 
ModuleController moduleController = null
 the controller object, which acts as bridge between Pepper framework and Pepper module.
 
SCorpusGraph sCorpusGraph = null
 The SCorpusGraph object which should be processed by this module.
 
boolean isMultithreaded = true
 

Additional Inherited Members

- Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperModule
static final String ENDING_FOLDER = "FOLDER"
 A string specifying a value for a folder as ending. More...
 
static final String ENDING_LEAF_FOLDER = "LEAF_FOLDER"
 A string specifying a value for a leaf folder as ending. More...
 
static final String ENDING_XML = "xml"
 Ending for an xml file. More...
 
static final String ENDING_TXT = "txt"
 Ending for an txt file. More...
 
static final String ENDING_TAB = "tab"
 Ending for an tab file. More...
 
static final String ENDING_ALL_FILES = "ALL_FILES"
 All kinds of file endings.
 
- Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperImporter
static final String NEGATIVE_FILE_EXTENSION_MARKER = "-"
 A character or character sequence to mark a file extension as not to be one of the imported ones.
 

Detailed Description

An importer in Pepper reads data from a format A and maps its data to a Salt model. An importer must implement the class PepperImporter and can extend the this class. We strongly recommend to extend this class, since it contains a lot of helpful functions and methods controlling the workflow.

See also
PepperImporter
Author
Florian Zipser

Constructor & Destructor Documentation

◆ PepperImporterImpl()

org.corpus_tools.pepper.impl.PepperImporterImpl.PepperImporterImpl ( )
protected

Creates a PepperModule of type MODULE_TYPE#IMPORTER.

The name is set to "MyImporter".


We recommend to use the constructor PepperImporterImpl#PepperImporterImpl(String) and pass a proper name.

Member Function Documentation

◆ getIgnoreEndings()

synchronized Collection<String> org.corpus_tools.pepper.impl.PepperImporterImpl.getIgnoreEndings ( )

Returns a collection of filenames, not to be imported.

{@inheritDoc #importIgnoreList} .

Returns

Implements org.corpus_tools.pepper.modules.PepperImporter.

◆ importCorpusStructureRec()

Boolean org.corpus_tools.pepper.impl.PepperImporterImpl.importCorpusStructureRec ( URI  currURI,
SCorpus  parent 
)
protected

Top down traversal in file given structure.

This method is called by importCorpusStructure(SCorpusGraph) and creates the corpus-structure via a top down traversal in file structure. For each found file (real file and folder), the method setTypeOfResource(URI) is called to set the type of the resource. If the type is a SALT_TYPE#SDOCUMENT a SDocument object is created for the resource, if the type is a SALT_TYPE#SCORPUS a SCorpus object is created, if the type is null, the resource is ignored.

Parameters
currURI
parentsID
endings
Returns
retrns true, if path contains documents, flase otherwise
Exceptions
IOException

◆ readXMLResource()

void org.corpus_tools.pepper.impl.PepperImporterImpl.readXMLResource ( DefaultHandler2  contentHandler,
URI  documentLocation 
)
protected

Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler.

It is assumed, that the file encoding is set to UTF-8.

Parameters
contentHandlerDefaultHandler2 implementation
documentLocationlocation of the xml-file

◆ sampleFileContent()

Collection<String> org.corpus_tools.pepper.impl.PepperImporterImpl.sampleFileContent ( final URI  corpusPath,
final String...  fileEndings 
)
protected

Returns {@value IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path.

This method only delegates to IsImportableUtil#sampleFileContent(URI, int, int, String...). The class IsImportableUtil also contains further helper methods, in case this method is too unprecise.

Parameters
corpusPathdirectory to be searched in
fileEndingsendings to be considered. If no endings specified, all files are considered
Returns
numberOfLines lines of numberOfSampledFiles files

◆ setCorpusPathResolver()

void org.corpus_tools.pepper.impl.PepperImporterImpl.setCorpusPathResolver ( CorpusPathResolver  corpusPathResolver)

Sets a CorpusPathResolver which is used by isImportable(URI).

With a CorpusPathResolver it is possible, to share read lines of files between multiple importers. Doing this saves time for retrieving the content of the corpus path and the reading of the first x lines of the files.

Parameters
corpusPathResolver

◆ start()

void org.corpus_tools.pepper.impl.PepperImporterImpl.start ( ) throws PepperModuleException

Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called.

  1. a check if corpus path exists

Reimplemented from org.corpus_tools.pepper.impl.PepperModuleImpl.