Pepper
3.3.3-SNAPSHOT
A highly extensible plattform for conversion and manipulationoflinguisticdata.
|
Inherits org.corpus_tools.pepper.impl.PepperModuleImpl, and org.corpus_tools.pepper.modules.PepperImporter.
Inherited by org.corpus_tools.pepper.modules.coreModules.DoNothingImporter, org.corpus_tools.pepper.modules.coreModules.SaltXMLImporter, and org.corpus_tools.pepper.modules.coreModules.TextImporter.
Public Member Functions | |||||
List< FormatDesc > | getSupportedFormats () | ||||
{ PepperImporter::getSupportedFormats()} | |||||
FormatDesc | addSupportedFormat (String formatName, String formatVersion, URI formatReference) | ||||
{ PepperImporter::addSupportedFormat(String, String, URI)} | |||||
CorpusDesc | getCorpusDesc () | ||||
{ PepperImporter::getCorpusDefinition()} | |||||
void | setCorpusDesc (CorpusDesc newCorpusDefinition) | ||||
{ PepperImporter::setCorpusDefinition(CorpusDefinition)} | |||||
synchronized Map< Identifier, URI > | getIdentifier2ResourceTable () | ||||
{ PepperImporter::getIdentifier2ResourceTable()} | |||||
void | importCorpusStructure (SCorpusGraph corpusGraph) throws PepperModuleException | ||||
{ PepperImporter::importCorpusStructure(SCorpusGraph)} | |||||
void | start () throws PepperModuleException | ||||
Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called. More... | |||||
synchronized Collection< String > | getDocumentEndings () | ||||
{ PepperImporter::getDocumentEndings()} | |||||
synchronized Collection< String > | getCorpusEndings () | ||||
{ PepperImporter::getCorpusEndings()} | |||||
SALT_TYPE | setTypeOfResource (URI resource) | ||||
{ PepperImporter::setTypeOfResource(URI)} | |||||
synchronized Collection< String > | getIgnoreEndings () | ||||
Returns a collection of filenames, not to be imported. More... | |||||
Double | isImportable (URI corpusPath) | ||||
{ PepperImporter::isImportable(URI)} | |||||
void | setCorpusPathResolver (CorpusPathResolver corpusPathResolver) | ||||
Sets a CorpusPathResolver which is used by isImportable(URI). More... | |||||
Public Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl | |||||
PepperModuleDesc | getFingerprint () | ||||
Returns a PepperModuleDesc object, which is a kind of a fingerprint of this PepperModule.This fingerprint for instance contains information like the name, the version of this module or information about the supplier.
| |||||
String | getName () | ||||
Returns the name of this module.In most cases, the name somehow describes the task of the module.
| |||||
String | getVersion () | ||||
Returns the version of this module.
| |||||
void | setVersion (String newVersion) | ||||
Sets the version of this module.The version normally is set internally, this method only exists for dependency injection, by the modules project itself. But this method is never called by the pepper framework.
| |||||
MODULE_TYPE | getModuleType () | ||||
Returns the type of this module.
| |||||
String | getDesc () | ||||
Returns a short description of this module.Please support some information, for the user, of what task this module does.
| |||||
void | setDesc (String desc) | ||||
Sets a short description of this module.Please support some information, for the user, of what task this module does.
| |||||
URI | getSupplierContact () | ||||
Returns a uri where to find more information about this module and where to find some contact information to contact the supplier.
| |||||
void | setSupplierContact (URI supplierContact) | ||||
Sets a uri where to find more information about this module and where to find some contact information to contact the supplier.
| |||||
URI | getSupplierHomepage () | ||||
Sets the URI to the homepage describing the functionality of the module. More... | |||||
void | setSupplierHomepage (URI hp) | ||||
Returns the URI to the homepage describing the functionality of the module. More... | |||||
PepperModuleProperties | getProperties () | ||||
Returns a PepperModuleProperties object containing properties to customize the behavior of this PepperModule.
| |||||
void | setProperties (PepperModuleProperties properties) | ||||
Sets thePepperModuleProperties object containing properties to customize the behavior of this PepperModule.Please make sure, that this method is called in constructor of your module. If not, a general PepperModuleProperties object is created by the pepper framework and will be initialized. This means, when calling this method later, all properties for customizing the module will be overridden.
| |||||
SaltProject | getSaltProject () | ||||
Returns the SaltProject object, which is filled, manipulated or exported by the current module.
| |||||
synchronized void | setSaltProject (SaltProject newSaltProject) | ||||
Sets the SaltProject object, which is filled, manipulated or exported by the current module.Note: This method only should be called by the pepper framework.
| |||||
URI | getResources () | ||||
Returns the path of the folder which might contain resources for a Pepper module.This is the folder, which is delivered as part of the modules zip. Usually a Pepper module is a zip file containing a jar file and a folder having the same name as the jar file. In default configuration all files of folder "./src/main/resources" are copied to the resource folder.
| |||||
void | setResources (URI newResources) | ||||
Sets the resource folder used by getResources().This method should only be invoked by the Pepper framework. The documentation of getResources() for more details.
| |||||
URI | getTemproraries () | ||||
TODO make docu. | |||||
void | setTemproraries (URI newTemproraries) | ||||
TODO make docu. | |||||
String | getSymbolicName () | ||||
Returns the symbolic name of this OSGi bundle.
| |||||
void | setSymbolicName (String newSymbolicName) | ||||
Sets the symbolic name of this OSGi bundle.This value is set automatically inside the activate method, which is implemented in PepperModuleImpl class. If you want to manipulate that method. make sure to set the symbolic name and make sure, that it is set to the bundles symbolic name.
| |||||
ComponentContext | getComponentContext () | ||||
Returns the ComponentContext of the OSGi environment the bundle was started in. More... | |||||
Collection< String > | getStartProblems () | ||||
If isReadyToStart() has returned false, this method returns a list of reasons why this module is not ready to start.
| |||||
boolean | isReadyToStart () throws PepperModuleNotReadyException | ||||
This method is called by the pepper framework after initializing this object and directly before start processing.Initializing means setting properties PepperModuleProperties, setting temporary files, resources etc. . returns false or throws an exception in case of PepperModule instance is not ready for any reason This method is also called, when Pepper is in self-test mode, to check if module is correctly instantiated. The default implementation checks:
When overriding this method, please call super.isReadyToStart() first and in case a problem occured add it to the list getStartProblems().
| |||||
ModuleController | getModuleController () | ||||
Returns the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework.
| |||||
void | setPepperModuleController (ModuleController newModuleController) | ||||
Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Also calls the inverse method ModuleController#setPepperModule_basic(PepperModule) . Note, this method only should be called by pepper framework.
| |||||
void | setPepperModuleController_basic (ModuleController newModuleController) | ||||
Sets the container and controller object for the current module.The ModuleController object is a kind of communicator between a PepperModule and the pepper framework. Note, this method only should be called by pepper framework.
| |||||
SCorpusGraph | getCorpusGraph () | ||||
Returns the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject().
| |||||
void | setCorpusGraph (SCorpusGraph newSCorpusGraph) | ||||
Sets the SCorpusGraph object which is filled, manipulated or exported by the current module.The SCorpusGraph object is contained in the salt project getSaltProject(). Note: This method only should be called by the pepper framework.
| |||||
void | setIsMultithreaded (boolean isMultithreaded) | ||||
Sets whether this PepperModule is able to run multithreaded.This method only should be called by the module itself.
| |||||
boolean | isMultithreaded () | ||||
Returns whether this PepperModule is able to run multithreaded.The behavior only should be set by the module itself via calling setIsMultithreaded(boolean).
| |||||
void | start () throws PepperModuleException | ||||
Starts the conversion process.This method is the main method of a pepper module. If this method is not overridden, it will call start(Identifier) for each SDocument and SCorpus object being contained in the set SCorpusGraph. This is done in a multithreaded way by default. Note: When your module should not run in multithreaded mode, call setIsMultithreaded(boolean) . | |||||
void | done (Identifier id, DOCUMENT_STATUS result) | ||||
This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping for this object is done.
| |||||
void | done (PepperMapperController controller) | ||||
This method is called by a PepperMapperController object to notify the PepperModule object, that the mapping is done.
| |||||
void | start (Identifier sElementId) throws PepperModuleException | ||||
This method is called by method start(), if the method was not overridden by the current class. More... | |||||
PepperMapper | createPepperMapper (Identifier sElementId) | ||||
OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING.This method creates a customized PepperMapper object and returns it. You can here do some additional initialisations. Thinks like setting the Identifier of the SDocument or SCorpus object and the URI resource is done by the framework (or more in detail in method start()). The parameter sElementId , if a PepperMapper object should be created in case of the object to map is either an SDocument object or an SCorpus object of the mapper should be initialized differently. Note: Override this method.
| |||||
void | end () throws PepperModuleException | ||||
Calls method start(Identifier) for every root SCorpus of SaltProject object. | |||||
void | uncaughtException (Thread t, Throwable e) | ||||
Method catches Uncaught exceptions thrown by PepperMapperImpl while running as Thread. More... | |||||
Double | getProgress (String globalId) | ||||
This method is invoked by the Pepper framework, to get the current progress concerning the SDocument object corresponding to the given Identifier in percent.A valid value return must be between 0 and 1. Note: In case, you have overridden the method start(Identifier) or start(), please also override this method, because it accesses an internal list of all mappers, which initialized in start(Identifier).
| |||||
Double | getProgress () | ||||
{ PepperModule::getProgress()} | |||||
List< Identifier > | proposeImportOrder (SCorpusGraph sCorpusGraph) | ||||
This method could be overridden, to make a proposal for the import order of SDocument objects.Overriding this method is useful, in case of the order matters in the specific mapping of this PepperModule . In this case a influencing the import order can decrease the processing time. If you do not want to influence the order, just return an empty list, or don't override this method. In case you want to override this method, you can return a value for each passed SCorpusGraph. OVERRIDE THIS METHOD FOR CUSTOMIZED MAPPING.
| |||||
String | toString () | ||||
Returns a string representation of this object. More... | |||||
SelfTestDesc | getSelfTestDesc () | ||||
This method is called by the Pepper framework to run an integration test for module. More... | |||||
Protected Member Functions | |
PepperImporterImpl () | |
Creates a PepperModule of type MODULE_TYPE#IMPORTER. More... | |
PepperImporterImpl (String name) | |
Creates a PepperModule of type MODULE_TYPE#IMPORTER and sets is name to the passed one. | |
Boolean | importCorpusStructureRec (URI currURI, SCorpus parent) |
Top down traversal in file given structure. More... | |
void | readXMLResource (DefaultHandler2 contentHandler, URI documentLocation) |
Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler. More... | |
Collection< String > | sampleFileContent (final URI corpusPath, final String... fileEndings) |
Returns { IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of { IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path. More... | |
Protected Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl | |
PepperModuleImpl () | |
Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More... | |
PepperModuleImpl (String name) | |
Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER. More... | |
void | setName (String name) |
Sets the name of this PepperModule. More... | |
void | activate (ComponentContext componentContext) |
This method is called by OSGi framework and sets the component context, this class is running in. More... | |
Map< String, PepperMapperController > | getMapperControllers () |
Returns a threadsafe map of all PepperMapperController objects which are connected with a started PepperMapper corresponding to their. More... | |
ThreadGroup | getMapperThreadGroup () |
Returns a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More... | |
void | setMapperThreadGroup (ThreadGroup mapperThreadGroup) |
Sets a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in. More... | |
Map< String, DocumentController > | getDocumentId2DC () |
Returns the map relating Identifier belonging to SDocument objects to their DocumentController container. More... | |
Protected Attributes | |
CorpusDesc | corpusDesc |
TODO make docu. | |
Protected Attributes inherited from org.corpus_tools.pepper.impl.PepperModuleImpl | |
Logger | logger = LoggerFactory.getLogger("Pepper") |
SaltProject | saltProject = null |
Salt project which is processed by module. | |
URI | resources = null |
TODO make docu. | |
URI | temproraries = null |
TODO make docu. | |
String | symbolicName = null |
TODO make docu. | |
ModuleController | moduleController = null |
the controller object, which acts as bridge between Pepper framework and Pepper module. | |
SCorpusGraph | sCorpusGraph = null |
The SCorpusGraph object which should be processed by this module. | |
boolean | isMultithreaded = true |
Additional Inherited Members | |
Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperModule | |
static final String | ENDING_FOLDER = "FOLDER" |
A string specifying a value for a folder as ending. More... | |
static final String | ENDING_LEAF_FOLDER = "LEAF_FOLDER" |
A string specifying a value for a leaf folder as ending. More... | |
static final String | ENDING_XML = "xml" |
Ending for an xml file. More... | |
static final String | ENDING_TXT = "txt" |
Ending for an txt file. More... | |
static final String | ENDING_TAB = "tab" |
Ending for an tab file. More... | |
static final String | ENDING_ALL_FILES = "ALL_FILES" |
All kinds of file endings. | |
Static Public Attributes inherited from org.corpus_tools.pepper.modules.PepperImporter | |
static final String | NEGATIVE_FILE_EXTENSION_MARKER = "-" |
A character or character sequence to mark a file extension as not to be one of the imported ones. | |
An importer in Pepper reads data from a format A and maps its data to a Salt model. An importer must implement the class PepperImporter and can extend the this class. We strongly recommend to extend this class, since it contains a lot of helpful functions and methods controlling the workflow.
|
protected |
Creates a PepperModule of type MODULE_TYPE#IMPORTER.
The name is set to "MyImporter".
We recommend to use the constructor PepperImporterImpl#PepperImporterImpl(String) and pass a proper name.
synchronized Collection<String> org.corpus_tools.pepper.impl.PepperImporterImpl.getIgnoreEndings | ( | ) |
Returns a collection of filenames, not to be imported.
{ #importIgnoreList} .
Implements org.corpus_tools.pepper.modules.PepperImporter.
|
protected |
Top down traversal in file given structure.
This method is called by importCorpusStructure(SCorpusGraph) and creates the corpus-structure via a top down traversal in file structure. For each found file (real file and folder), the method setTypeOfResource(URI) is called to set the type of the resource. If the type is a SALT_TYPE#SDOCUMENT a SDocument object is created for the resource, if the type is a SALT_TYPE#SCORPUS a SCorpus object is created, if the type is null, the resource is ignored.
currURI | |
parentsID | |
endings |
IOException |
|
protected |
Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler.
It is assumed, that the file encoding is set to UTF-8.
contentHandler | DefaultHandler2 implementation |
documentLocation | location of the xml-file |
|
protected |
Returns { IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of { IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings
recursively from specified corpus path.
This method only delegates to IsImportableUtil#sampleFileContent(URI, int, int, String...). The class IsImportableUtil also contains further helper methods, in case this method is too unprecise.
corpusPath | directory to be searched in |
fileEndings | endings to be considered. If no endings specified, all files are considered |
numberOfLines
lines of numberOfSampledFiles
files void org.corpus_tools.pepper.impl.PepperImporterImpl.setCorpusPathResolver | ( | CorpusPathResolver | corpusPathResolver | ) |
Sets a CorpusPathResolver which is used by isImportable(URI).
With a CorpusPathResolver it is possible, to share read lines of files between multiple importers. Doing this saves time for retrieving the content of the corpus path and the reading of the first x lines of the files.
corpusPathResolver |
void org.corpus_tools.pepper.impl.PepperImporterImpl.start | ( | ) | throws PepperModuleException |
Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called.
Implements org.corpus_tools.pepper.modules.PepperModule.