Pepper 3.7.0
A highly extensible plattform for conversion and
Loading...
Searching...
No Matches
org.corpus_tools.pepper.impl.PepperImporterImpl Class Referenceabstract

Inherits org.corpus_tools.pepper.impl.PepperModuleImpl, and PepperImporter.

Public Member Functions

List< FormatDesc > getSupportedFormats ()
 {@inheritDoc PepperImporter::getSupportedFormats()}
 
FormatDesc addSupportedFormat (String formatName, String formatVersion, URI formatReference)
 {@inheritDoc PepperImporter::addSupportedFormat(String, String, URI)}
 
CorpusDesc getCorpusDesc ()
 {@inheritDoc PepperImporter::getCorpusDefinition()}
 
void setCorpusDesc (CorpusDesc newCorpusDefinition)
 {@inheritDoc PepperImporter::setCorpusDefinition(CorpusDefinition)}
 
synchronized Map< Identifier, URI > getIdentifier2ResourceTable ()
 {@inheritDoc PepperImporter::getIdentifier2ResourceTable()}
 
void importCorpusStructure (SCorpusGraph corpusGraph) throws PepperModuleException
 {@inheritDoc PepperImporter::importCorpusStructure(SCorpusGraph)}
 
void start () throws PepperModuleException
 Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called.
 
synchronized Collection< String > getDocumentEndings ()
 {@inheritDoc PepperImporter::getDocumentEndings()}
 
synchronized Collection< String > getCorpusEndings ()
 {@inheritDoc PepperImporter::getCorpusEndings()}
 
SALT_TYPE setTypeOfResource (URI resource)
 {@inheritDoc PepperImporter::setTypeOfResource(URI)}
 
synchronized Collection< String > getIgnoreEndings ()
 Returns a collection of filenames, not to be imported.
 
Double isImportable (URI corpusPath)
 {@inheritDoc PepperImporter::isImportable(URI)}
 
void setCorpusPathResolver (CorpusPathResolver corpusPathResolver)
 Sets a CorpusPathResolver which is used by isImportable(URI).
 
- Public Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
PepperModuleDesc getFingerprint ()
 
String getName ()
 
String getVersion ()
 
void setVersion (String newVersion)
 
MODULE_TYPE getModuleType ()
 
String getDesc ()
 
void setDesc (String desc)
 
URI getSupplierContact ()
 
void setSupplierContact (URI supplierContact)
 
URI getSupplierHomepage ()
 
void setSupplierHomepage (URI hp)
 
PepperModuleProperties getProperties ()
 
void setProperties (PepperModuleProperties properties)
 
SaltProject getSaltProject ()
 
synchronized void setSaltProject (SaltProject newSaltProject)
 
URI getResources ()
 
void setResources (URI newResources)
 
URI getTemproraries ()
 
void setTemproraries (URI newTemproraries)
 
String getSymbolicName ()
 
void setSymbolicName (String newSymbolicName)
 
ComponentContext getComponentContext ()
 Returns the ComponentContext of the OSGi environment the bundle was started in.
 
Collection< String > getStartProblems ()
 
boolean isReadyToStart () throws PepperModuleNotReadyException
 
ModuleController getModuleController ()
 
void setPepperModuleController (ModuleController newModuleController)
 
void setPepperModuleController_basic (ModuleController newModuleController)
 
SCorpusGraph getCorpusGraph ()
 
void setCorpusGraph (SCorpusGraph newSCorpusGraph)
 
void setIsMultithreaded (boolean isMultithreaded)
 
boolean isMultithreaded ()
 
void done (Identifier id, DOCUMENT_STATUS result)
 
void done (PepperMapperController controller)
 
void start (Identifier sElementId) throws PepperModuleException
 This method is called by method start(), if the method was not overridden by the current class.
 
PepperMapper createPepperMapper (Identifier sElementId)
 
void end () throws PepperModuleException
 Calls method start(Identifier) for every root SCorpus of SaltProject object.
 
void uncaughtException (Thread t, Throwable e)
 Method catches Uncaught exceptions thrown by PepperMapperImpl while running as Thread.
 
Double getProgress (String globalId)
 
Double getProgress ()
 {@inheritDoc PepperModule::getProgress()}
 
List< Identifier > proposeImportOrder (SCorpusGraph sCorpusGraph)
 
String toString ()
 Returns a string representation of this object.
 
SelfTestDesc getSelfTestDesc ()
 

Protected Member Functions

 PepperImporterImpl ()
 Creates a PepperModule of type MODULE_TYPE#IMPORTER.
 
 PepperImporterImpl (String name)
 Creates a PepperModule of type MODULE_TYPE#IMPORTER and sets is name to the passed one.
 
Boolean importCorpusStructureRec (URI currURI, SCorpus parent)
 Top down traversal in file given structure.
 
void readXMLResource (DefaultHandler2 contentHandler, URI documentLocation)
 Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler.
 
Collection< String > sampleFileContent (final URI corpusPath, final String... fileEndings)
 Returns {@value IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path.
 
- Protected Member Functions inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
 PepperModuleImpl ()
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER.
 
 PepperModuleImpl (String name)
 Creates a PepperModule object, which is either a MODULE_TYPE#IMPORTER, a MODULE_TYPE#MANIPULATOR or a MODULE_TYPE#EXPORTER.
 
void setName (String name)
 Sets the name of this PepperModule.
 
void activate (ComponentContext componentContext)
 This method is called by OSGi framework and sets the component context, this class is running in.
 
Map< String, PepperMapperController > getMapperControllers ()
 Returns a threadsafe map of all PepperMapperController objects which are connected with a started PepperMapper corresponding to their.
 
ThreadGroup getMapperThreadGroup ()
 Returns a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in.
 
void setMapperThreadGroup (ThreadGroup mapperThreadGroup)
 Sets a ThreadGroup where PepperMapper objects and the corresponding threads are supposed to run in.
 
Map< String, DocumentController > getDocumentId2DC ()
 Returns the map relating Identifier belonging to SDocument objects to their DocumentController container.
 

Protected Attributes

CorpusDesc corpusDesc
 TODO make docu.
 
- Protected Attributes inherited from org.corpus_tools.pepper.impl.PepperModuleImpl
Logger logger = LoggerFactory.getLogger("Pepper")
 
SaltProject saltProject = null
 Salt project which is processed by module.
 
URI resources = null
 TODO make docu.
 
URI temproraries = null
 TODO make docu.
 
String symbolicName = null
 TODO make docu.
 
ModuleController moduleController = null
 the controller object, which acts as bridge between Pepper framework and Pepper module.
 
SCorpusGraph sCorpusGraph = null
 The SCorpusGraph object which should be processed by this module.
 
boolean isMultithreaded = true
 

Detailed Description

An importer in Pepper reads data from a format A and maps its data to a Salt model. An importer must implement the class PepperImporter and can extend the this class. We strongly recommend to extend this class, since it contains a lot of helpful functions and methods controlling the workflow.

See also
PepperImporter
Author
Florian Zipser

Constructor & Destructor Documentation

◆ PepperImporterImpl()

org.corpus_tools.pepper.impl.PepperImporterImpl.PepperImporterImpl ( )
protected

Creates a PepperModule of type MODULE_TYPE#IMPORTER.

The name is set to "MyImporter".


We recommend to use the constructor PepperImporterImpl#PepperImporterImpl(String) and pass a proper name.

Member Function Documentation

◆ getIgnoreEndings()

synchronized Collection< String > org.corpus_tools.pepper.impl.PepperImporterImpl.getIgnoreEndings ( )

Returns a collection of filenames, not to be imported.

{@inheritDoc #importIgnoreList} .

Returns

◆ importCorpusStructureRec()

Boolean org.corpus_tools.pepper.impl.PepperImporterImpl.importCorpusStructureRec ( URI  currURI,
SCorpus  parent 
)
protected

Top down traversal in file given structure.

This method is called by importCorpusStructure(SCorpusGraph) and creates the corpus-structure via a top down traversal in file structure. For each found file (real file and folder), the method setTypeOfResource(URI) is called to set the type of the resource. If the type is a SALT_TYPE#SDOCUMENT a SDocument object is created for the resource, if the type is a SALT_TYPE#SCORPUS a SCorpus object is created, if the type is null, the resource is ignored.

Parameters
currURI
parentsID
endings
Returns
retrns true, if path contains documents, flase otherwise
Exceptions
IOException

◆ readXMLResource()

void org.corpus_tools.pepper.impl.PepperImporterImpl.readXMLResource ( DefaultHandler2  contentHandler,
URI  documentLocation 
)
protected

Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler.

It is assumed, that the file encoding is set to UTF-8.

Parameters
contentHandlerDefaultHandler2 implementation
documentLocationlocation of the xml-file

◆ sampleFileContent()

Collection< String > org.corpus_tools.pepper.impl.PepperImporterImpl.sampleFileContent ( final URI  corpusPath,
final String...  fileEndings 
)
protected

Returns {@value IsImportableUtil::NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value IsImportableUtil::NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path.

This method only delegates to IsImportableUtil#sampleFileContent(URI, int, int, String...). The class IsImportableUtil also contains further helper methods, in case this method is too unprecise.

Parameters
corpusPathdirectory to be searched in
fileEndingsendings to be considered. If no endings specified, all files are considered
Returns
numberOfLines lines of numberOfSampledFiles files

◆ setCorpusPathResolver()

void org.corpus_tools.pepper.impl.PepperImporterImpl.setCorpusPathResolver ( CorpusPathResolver  corpusPathResolver)

Sets a CorpusPathResolver which is used by isImportable(URI).

With a CorpusPathResolver it is possible, to share read lines of files between multiple importers. Doing this saves time for retrieving the content of the corpus path and the reading of the first x lines of the files.

Parameters
corpusPathResolver

◆ start()

void org.corpus_tools.pepper.impl.PepperImporterImpl.start ( ) throws PepperModuleException

Overrides the method PepperModuleImpl#start() to add the following, before PepperModuleImpl#start() is called.

  1. a check if corpus path exists

Reimplemented from org.corpus_tools.pepper.impl.PepperModuleImpl.