Pepper
3.6.0
A highly extensible plattform for conversion and manipulationoflinguisticdata.
|
This class is a helper class for developing PepperModules. More...
Inherits DefaultHandler2.
Public Member Functions | |
void | setXmlResource (URI resource) throws FileNotFoundException |
Sets xml file to be parsed. More... | |
URI | getXmlResource () |
returns xml file to be parsed. | |
void | setJavaResource (URI resource) throws FileNotFoundException |
Sets java file to be parsed. More... | |
URI | getJavaResource () |
returns java file to be parsed. | |
void | extract () |
{@inheritDoc XMLTagExtractor} | |
void | startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException |
Static Public Member Functions | |
static void | main (String[] args) |
{@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH More... | |
Static Public Attributes | |
static final String | PREFIX_NAMESPACE = "NS_" |
Name of prefix for xml namespaces prefix. More... | |
static final String | PREFIX_NAMESPACE_VALUE = "NS_VALUE_" |
Name of prefix for xml namespaces. More... | |
static final String | PREFIX_ELEMENT = "TAG_" |
Name of prefix for xml tags. More... | |
static final String | PREFIX_ATTRIBUTE = "ATT_" |
Name of prefix for xml attribute. More... | |
static final String | ARG_INPUT = "-i" |
argument for command line call for determine input file | |
static final String | ARG_OUTPUT = "-o" |
argument for command line call for determine output file | |
This class is a helper class for developing PepperModules.
The XMLTagExtractor generates a dictionary of the xml vocabulary. The dictionary consists of xml tag names, xml namespaces and attribute names from a source file and generates a java interface and a java class as well. The interface contains the xml namespace declarations, the xml element and attribute names as fields (public static final Strings). The generated java class implements that interface and further extends the DefaultHandler2 class, to read a xml file following the generated xml dictionary.
This class can be very helpful, when creating PepperImporter or PepperExporter classes consuming or producing xml formats. In that case, a sample xml file (containing most or better all of the elements) can be used to extract all element names as keys for the implementation.
For instance, the following xml file:
<sentence xml:lang="en"> <token pos="VBZ">Is</token> <token pos="DT" lemma="this">this</token> <token>example</token> </sentence>
will be result in the following interface:
public interface INTERFACE_NAME { public static final String TAG_TOKEN = "token"; public static final String TAG_SENTENCE = "sentence"; public static final String ATT_LEMMA = "lemma"; public static final String ATT_XML_LANG = "xml:lang"; public static final String ATT_POS = "pos"; }
where INTERFACE_NAME is the name of the xml file.
and in the following class:
public class INTERFACE_NAMEReader extends DefaultHandler2 implements Bergleute_WebLicht_BitPar { public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (TAG_TOKEN.equals(qName)) { } else if (TAG_SENTENCE.equals(qName)) { } } }
Using as a library:
XMLTagExtractor extractor = new XMLTagExtractor(); extractor.setXmlResource(input); extractor.setJavaResource(output); extractor.extract();
Running this tiny program from command line:
java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH
|
static |
{@inheritDoc XMLTagExtractor} java XMLTagExtractor.class -i XML_FILE -o OUTPUT_PATH
args | -i XML_FILE -o OUTPUT_PATH |
void org.corpus_tools.pepper.cli.XMLTagExtractor.setJavaResource | ( | URI | resource | ) | throws FileNotFoundException |
Sets java file to be parsed.
FileNotFoundException |
void org.corpus_tools.pepper.cli.XMLTagExtractor.setXmlResource | ( | URI | resource | ) | throws FileNotFoundException |
Sets xml file to be parsed.
FileNotFoundException |
|
static |
Name of prefix for xml attribute.
For instance the xml attribute <token pos="..."> will result in field:
ATT_POS
|
static |
Name of prefix for xml tags.
For instance the xml tag <token> will result in field:
TAG_TOKEN
|
static |
Name of prefix for xml namespaces prefix.
For instance the xml namespace prefix <myns:token xmlns:myns="..."> will result in field:
NS_MYNS
|
static |
Name of prefix for xml namespaces.
For instance the xml namespace <myns:token xmlns:myns="https://ns.de"> will result in field:
NS_VALUE_MYNS="https://ns.de"