Pepper  3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
org.corpus_tools.pepper.impl.CorpusPathResolver Class Reference

Public Member Functions

 CorpusPathResolver (final URI corpusPath) throws FileNotFoundException
 
Collection< String > sampleFileContent (final String... fileEndings)
 Returns {@value NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path. More...
 
Collection< String > sampleFileContent (int numberOfSampledFiles, int numberOfSampledLines, final String... fileEndings)
 Returns fileEndings lines of a sampled set of numberOfSampledLines files having the ending specified by fileEndings recursively from specified corpus path. More...
 

Static Public Attributes

static final int NUMBER_OF_SAMPLED_FILES = 20
 The number of files which are read for sampling when invoking findAppropriateImporters(URI).
 
static final int NUMBER_OF_SAMPLED_LINES = 10
 The number of lines in a file which are read for sampling when invoking findAppropriateImporters(URI).
 

Protected Member Functions

void setCorpusPath (final URI corpusPath) throws FileNotFoundException
 
Multimap< String, File > groupFilesByEnding (final URI corpusPath) throws FileNotFoundException
 Groups files for their file ending into a multimap. More...
 
Collection< FileContent > getXFilesWithExtension (int numOfFiles, int numOfLinesToRead, final String fileEnding)
 
Collection< File > sampleFiles (final Collection< File > files, int numberOfSampledFiles)
 Creates a sampled set of numberOfSampledFiles files recursively from directory dir with specified endings. More...
 
String readFirstLines (final File file, final int numOfLinesToRead)
 Reads the first X lines of the passed file and returns them as a String. More...
 

Protected Attributes

Multimap< String, File > unreadFilesGroupedByExtension
 
Multimap< String, FileContent > readFilesGroupedByExtension
 

Member Function Documentation

◆ groupFilesByEnding()

Multimap<String, File> org.corpus_tools.pepper.impl.CorpusPathResolver.groupFilesByEnding ( final URI  corpusPath) throws FileNotFoundException
protected

Groups files for their file ending into a multimap.

The key is the ending.

Parameters
corpusPath
Returns
Exceptions
FileNotFoundException

◆ readFirstLines()

String org.corpus_tools.pepper.impl.CorpusPathResolver.readFirstLines ( final File  file,
final int  numOfLinesToRead 
)
protected

Reads the first X lines of the passed file and returns them as a String.

Parameters
corpusPathpath to file
linesnumber of lines
Returns
first X lines

◆ sampleFileContent() [1/2]

Collection<String> org.corpus_tools.pepper.impl.CorpusPathResolver.sampleFileContent ( final String...  fileEndings)

Returns {@value NUMBER_OF_SAMPLED_LINES} lines of a sampled set of {@value NUMBER_OF_SAMPLED_FILES} files having the ending specified by fileEndings recursively from specified corpus path.

Parameters
fileEndingending to be considered. If no endings specified, all files are considered
Returns
the first {@value NUMBER_OF_SAMPLED_LINES} lines of {@value NUMBER_OF_SAMPLED_FILES} files

◆ sampleFileContent() [2/2]

Collection<String> org.corpus_tools.pepper.impl.CorpusPathResolver.sampleFileContent ( int  numberOfSampledFiles,
int  numberOfSampledLines,
final String...  fileEndings 
)

Returns fileEndings lines of a sampled set of numberOfSampledLines files having the ending specified by fileEndings recursively from specified corpus path.

Parameters
numberOfSampledFilesnumber of files to be read
numberOfSampledLinesnumber of lines to be read
fileEndingending to be considered. If no endings specified, all files are considered
Returns
the first {@value NUMBER_OF_SAMPLED_LINES} lines of numberOfSampledLines files

◆ sampleFiles()

Collection<File> org.corpus_tools.pepper.impl.CorpusPathResolver.sampleFiles ( final Collection< File >  files,
int  numberOfSampledFiles 
)
protected

Creates a sampled set of numberOfSampledFiles files recursively from directory dir with specified endings.

Parameters
dirthe directory to be traversed recursively
numberOfSampledFilesnumber of files to be sampled
fileEndingsendings of files to be sampled
Returns
a collection of files having on of the endings in endings in directory dir