General information
TCFModules is part of the SaltNPepper project. This Pepper module allows the conversion of data provided in the TCF format to Salt, a graph-based data model. TCF is a common XML data exchange format which has been developed within the WebLicht architecture. This module has been developed in cooperation of the Clarin-D center Universität Stuttgart and the Humboldt-Universität zu Berlin.
Purpose
Since Pepper is a converter framework for multiple linguistic formats, the TCFModules allows users to convert data from the TCF format to a bunch of other linguistic formats. One aim of this project was to enable the conversion of TCF data to the ANNIS format (RelANNIS). ANNIS is a search and visualisation tool for multi-layered linguistic corpora. A list of data formats Salt can be mapped on can be found here.
Usage
To use TCFModules in Pepper, insert the following lines to your workflow file (*.pepperParams):<importerParams moduleName="TCFImporter" sourcePath="CORPUS_LOCATION"/>
or
<importerParams formatName="tcf" formatVersion="0.4" sourcePath="CORPUS_LOCATION"/>
Documentation
The TCF format provides several layers of linguistic annotations. Each TCF layer is mapped on its own layer in Salt. These layers contain either nodes (e.g. POS annotation), edges (e.g. dependency annotation) or both (e.g. coreference annotation) together with their annotations ...
A full documentation can be found here soon.
License
This project is an open source project under the Apache License, Version 2.0. You can download the sources here.
Contact, questions, bug reports, etc.
If you get in any trouble using TCFModules, don't hesitate to contact us: saltnpepper@lists.hu-berlin.de
Funders
This project was funded by the Clarin-D project and realized at the department of corpus linguistics and morphology of the Humboldt Universität.