TCFModules

Convert your TCF data to Salt

View project onGitHub

General information

TCFModules is part of the SaltNPepper project. This Pepper module allows the conversion of data provided in the TCF format to Salt, a graph-based data model. TCF is a common XML data exchange format which has been developed within the WebLicht architecture. This module has been developed in cooperation of the Clarin-D center Universität Stuttgart and the Humboldt-Universität zu Berlin.

Purpose

Since Pepper is a converter framework for multiple linguistic formats, the TCFModules allows users to convert data from the TCF format to a bunch of other linguistic formats. One aim of this project was to enable the conversion of TCF data to the ANNIS format (RelANNIS). ANNIS is a search and visualisation tool for multi-layered linguistic corpora. A list of data formats Salt can be mapped on can be found here.

Usage

To use TCFModules in Pepper, insert the following lines to your workflow file (*.pepperParams):
<importerParams moduleName="TCFImporter" sourcePath="CORPUS_LOCATION"/>

or

<importerParams formatName="tcf" formatVersion="0.4" sourcePath="CORPUS_LOCATION"/>

Documentation

The TCF format provides several layers of linguistic annotations. Each TCF layer is mapped on its own layer in Salt. These layers contain either nodes (e.g. POS annotation), edges (e.g. dependency annotation) or both (e.g. coreference annotation) together with their annotations ...
A full documentation can be found here soon.

License

This project is an open source project under the Apache License, Version 2.0. You can download the sources here.

Contact, questions, bug reports, etc.

If you get in any trouble using TCFModules, don't hesitate to contact us: saltnpepper@lists.hu-berlin.de

Funders

This project was funded by the Clarin-D project and realized at the department of corpus linguistics and morphology of the Humboldt Universität.