Exporting Results
To export search results, open the menu "More" between the Search and History buttons and select "Export":
Enter the query whose results you want to export as usual in the AQL box. Note that you do not need to carry out the query first. You can enter the query and export without pressing Search before. Several exporter modules can be selected from the Export tab shown below.
The SimpleTextExporter simply gives the text for all tokens in each search result, including context, in a one-row-per-hit format. The tokens covered by the match area are marked with square brackets and the results are numbered, as in the following example:
0. of the International Brotherhood of [Magicians] Wednesday , October 9 ,
1. Magic Month in the United [States] . Wikinews spoke with William
2. of the International Brotherhood of [Magicians] , about the current state
3. - " Scarne on Card [Tricks] " and " Scarne on
4. and " Scarne on Magic [Tricks] " . That started me
The TokenExporter adds all annotations of each token separated by slashes (e.g. dogs/NNS/dog for a token dogs annotated with a part-of-speech NNS and a lemma dog).
The GridExporter adds all annotations available for the span of
retrieved tokens, with each annotation layer in a separate line.
Annotations are separated by spaces and the hierarchical order of
annotations is lost, though the span of tokens covered by each
annotation may optionally be given in square brackets (to turn this off
use the optional parameter numbers=false
in the ‘Parameters’ box). The
user can specify annotation layers to be exported in the additional
‘Annotation Keys’ box, and annotation names should be separated by
comas, as in the image above. Metadata annotations can also be exported
by entering “metakeys=” and a list of comma separated metadata names in
the Parameters box. If nothing is specified, all available annotations
and no metadata will be exported. Multiple options are separated by a
semicolon, e.g. the Parameters metakeys=type,docname;numbers=false
. An
example output with token numbers and the part of speech (pos) and
syntactic category annotations looks as follows.
0. tok of the International Brotherhood of Magicians Wednesday
pos IN[1-1] DT[2-2] NP[3-3] NP[4-4] IN[5-5] NPS[6-6] NP[7-7]
cat S[1-6] VP[1-6] NP[1-6] PP[1-6] NP[2-4] PP[5-6] NP[6-6] NP[7-12]
Meaning that the annotation cat="NP" applies to tokens 1-6 in the search result, and so on. Note that when specifying annotation layers, if the reserved name 'tok' is not specified, the tokens themselves will not be exported (annotations only).
The CSVExporter outputs the format usable by spreadsheet programs such as Excel or Calc. It is also easy to read CSV files in R- or Python-Scripts. Only the attributes of the search elements (#1, #2 etc. in AQL) are outputted, and are separated by tabs. The order and name of the attributes is declared in the first line of the export text, as in this example:
1_id 1_span 1_anno_const::cat 2_id 2_span 2_anno_GUM::claws5 2_anno_GUM::lemma 2_anno_GUM::penn_pos 2_anno_GUM::pos 2_anno_GUM::tok_func
salt:/GUM/GUM_interview_ants#const_0_39 thee amazingg adaptations NP salt:/GUM/GUM_interview_ants#tok_40 amazing AJ0 amazing JJ JJ amod
salt:/GUM/GUM_interview_ants#const_0_42 sociall insects NP salt:/GUM/GUM_interview_ants#tok_42 social AJ0 social JJ JJ amod
salt:/GUM/GUM_interview_ants#const_0_50 thee extremee parasitee pressure NP salt:/GUM/GUM_interview_ants#tok_51 extreme AJ0 extreme JJ JJ amod
The export shows the properties of an NP node dominating a token with the part-of-speech JJ. Since the token also has other attributes, such as the lemma and part of speech tags, these are also retrieved.
It is also possible to output metadata annotations per hit using the CSVExporter. To do so, use the parameter metakeys=meta1,meta2 etc. For example, if your documents have a metadata annotation called 'genre', you may export it for each search result as a further column using metakeys=genre in the parameters box.
Most of these exporters do not work well when the corpus uses multiple segmentations, like in dialogues.
If they extract the spanned text for a match, they might have no way of automatically knowing which segmentation is the right one and display an empty span.
For the CSVExporter, you can set the segmentation
parameter (e.g. segmentation=dipl
) to the name of the segmentation, and it will be used as source for the spanned text.
Note that exporting may be slow if the result set is large.