Pepper  3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
org.corpus_tools.pepper.cli.XMLTagExtractor Class Reference

This class is a helper class for developing PepperModules. More...

Inherits DefaultHandler2.

Public Member Functions

void setXmlResource (URI resource) throws FileNotFoundException
 Sets xml file to be parsed. More...
 
URI getXmlResource ()
 returns xml file to be parsed.
 
void setJavaResource (URI resource) throws FileNotFoundException
 Sets java file to be parsed. More...
 
URI getJavaResource ()
 returns java file to be parsed.
 
void extract ()
 {@inheritDoc XMLTagExtractor}
 
void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException
 

Static Public Member Functions

static void main (String[] args)
 {@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH More...
 

Static Public Attributes

static final String PREFIX_NAMESPACE = "NS_"
 Name of prefix for xml namespaces prefix. More...
 
static final String PREFIX_NAMESPACE_VALUE = "NS_VALUE_"
 Name of prefix for xml namespaces. More...
 
static final String PREFIX_ELEMENT = "TAG_"
 Name of prefix for xml tags. More...
 
static final String PREFIX_ATTRIBUTE = "ATT_"
 Name of prefix for xml attribute. More...
 
static final String ARG_INPUT = "-i"
 argument for command line call for determine input file
 
static final String ARG_OUTPUT = "-o"
 argument for command line call for determine output file
 

Detailed Description

This class is a helper class for developing PepperModules.

The XMLTagExtractor generates a dictionary of the xml vocabulary. The dictionary consists of xml tag names, xml namespaces and attribute names from a source file and generates a java interface and a java class as well. The interface contains the xml namespace declarations, the xml element and attribute names as fields (public static final Strings). The generated java class implements that interface and further extends the DefaultHandler2 class, to read a xml file following the generated xml dictionary.
This class can be very helpful, when creating PepperImporter or PepperExporter classes consuming or producing xml formats. In that case, a sample xml file (containing most or better all of the elements) can be used to extract all element names as keys for the implementation.
For instance, the following xml file:

<sentence xml:lang="en">
  <token pos="VBZ">Is</token>
  <token pos="DT" lemma="this">this</token>
  <token>example</token>
</sentence>

will be result in the following interface:

public interface INTERFACE_NAME {
        public static final String TAG_TOKEN = "token";
        public static final String TAG_SENTENCE = "sentence";
        public static final String ATT_LEMMA = "lemma";
        public static final String ATT_XML_LANG = "xml:lang";
        public static final String ATT_POS = "pos";
}

where INTERFACE_NAME is the name of the xml file.
and in the following class:

public class INTERFACE_NAMEReader extends DefaultHandler2 implements Bergleute_WebLicht_BitPar {
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
                if (TAG_TOKEN.equals(qName)) {
                } else if (TAG_SENTENCE.equals(qName)) {
                }
        }
}


Using as a library:

XMLTagExtractor extractor = new XMLTagExtractor();
extractor.setXmlResource(input);
extractor.setJavaResource(output);
extractor.extract();


Running this tiny program from command line:

java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH

Author
Florian Zipser

Member Function Documentation

◆ main()

static void org.corpus_tools.pepper.cli.XMLTagExtractor.main ( String[]  args)
static

{@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH

Parameters
args-i XML_FILE -o OUTPUT_PATH

◆ setJavaResource()

void org.corpus_tools.pepper.cli.XMLTagExtractor.setJavaResource ( URI  resource) throws FileNotFoundException

Sets java file to be parsed.

Exceptions
FileNotFoundException

◆ setXmlResource()

void org.corpus_tools.pepper.cli.XMLTagExtractor.setXmlResource ( URI  resource) throws FileNotFoundException

Sets xml file to be parsed.

Exceptions
FileNotFoundException

Member Data Documentation

◆ PREFIX_ATTRIBUTE

final String org.corpus_tools.pepper.cli.XMLTagExtractor.PREFIX_ATTRIBUTE = "ATT_"
static

Name of prefix for xml attribute.

For instance the xml attribute <token pos="..."> will result in field:
ATT_POS

◆ PREFIX_ELEMENT

final String org.corpus_tools.pepper.cli.XMLTagExtractor.PREFIX_ELEMENT = "TAG_"
static

Name of prefix for xml tags.

For instance the xml tag <token> will result in field:
TAG_TOKEN

◆ PREFIX_NAMESPACE

final String org.corpus_tools.pepper.cli.XMLTagExtractor.PREFIX_NAMESPACE = "NS_"
static

Name of prefix for xml namespaces prefix.

For instance the xml namespace prefix <myns:token xmlns:myns="..."> will result in field:
NS_MYNS

◆ PREFIX_NAMESPACE_VALUE

final String org.corpus_tools.pepper.cli.XMLTagExtractor.PREFIX_NAMESPACE_VALUE = "NS_VALUE_"
static

Name of prefix for xml namespaces.

For instance the xml namespace <myns:token xmlns:myns="https://ns.de"> will result in field:
NS_VALUE_MYNS="https://ns.de"