Corpus queries
Interface defining the REST API calls that ANNIS provides for querying the data.
All paths for this part of the service start with the "annis/query/" prefix.
Count matches of a query
Path(s)
GET
annis/query/search/count
Parameters
q
- The query in the ANNIS Query Language (AQL)corpora
- A comma separated list of corpus names
Responses
Code 200
Produces an XML representation of the total matches and the number of documents that contain matches (application/xml
):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<matchAndDocumentCount>
<!-- the number of documents that contain matches -->
<documentCount>2</documentCount>
<!-- total number of matches -->
<matchCount>399</matchCount>
</matchAndDocumentCount>
Find matches for a given query
Path(s)
GET
annis/query/search/find
Parameters
q
- The query in the ANNIS Query Language (AQL)corpora
- A comma separated list of corpus namesoffset
- Optional offset from where to start the matches. Default is 0.limit
- Optional limit of the number of returned matches. Set to -1 if unlimited. Default is -1.order
- Optional order how the results should be sorted. Can be either "normal", "random" or "inverted" "normal" is the default ordering, "inverted" inverses the default ordering and "random" is a non-stable (thus you will get different results for the same offset and limit) random ordering.
Responses
Code 200
A list of the match identifiers for the query.
Can produce the MIME type application/xml
in the following format
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<match-group>
<!-- each match is enclosed in an match tag -->
<match>
<!-- the first matched node of match 1 did not match an annotation -->
<anno></anno>
<!-- the second matched node of match 1 was a match on the 'tiger::pos' annotation-->
<anno>tiger::pos</anno>
<!-- ID of first matched node of match 1 -->
<id>salt:/pcc2/11299/#tok_1</id>
<!-- ID of second matched noded of match 1 -->
<id>salt:/pcc2/11299/#tok_2</id>
</match>
<match>
<anno></anno>
<anno>tiger::pos</anno>
<!-- ID of first matched noded of match 2 -->
<id>salt:/pcc2/11299/#tok_2</id>
<!-- ID of second matched noded of match 2-->
<id>salt:/pcc2/11299/#tok_3</id>
</match>
<!-- and so on -->
</match-group>
or the MIME type text/plain
salt:/pcc2/11299/#tok_1 tiger::pos::salt:/pcc2/11299/#tok_2
salt:/pcc2/11299/#tok_2 tiger::pos::salt:/pcc2/11299/#tok_3
salt:/pcc2/11299/#tok_3 tiger::pos::salt:/pcc2/11299/#tok_4
In this format, there is one line per match and each ID is separated by space. An ID can be prefixed by the fully qualified annotation name (which is separated with '::' from the ID).
Get a subgraph from a set of (matched) Salt IDs
Path(s)
POST
annis/query/search/subgraph
Request body
Consumes application/xml
:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<match-group>
<!-- each match is enclosed in an match tag -->
<match>
<!-- the first matched node of match 1 did not match an annotation -->
<anno></anno>
<!-- the second matched node of match 1 was a match on the 'tiger::pos' annotation-->
<anno>tiger::pos</anno>
<!-- ID of first matched node of match 1 -->
<id>salt:/pcc2/11299/#tok_1</id>
<!-- ID of second matched noded of match 1 -->
<id>salt:/pcc2/11299/#tok_2</id>
</match>
<match>
<anno></anno>
<anno>tiger::pos</anno>
<!-- ID of first matched noded of match 2 -->
<id>salt:/pcc2/11299/#tok_2</id>
<!-- ID of second matched noded of match 2-->
<id>salt:/pcc2/11299/#tok_3</id>
</match>
<!-- and so on -->
</match-group>
or consumes text/plain
:
salt:/pcc2/11299/#tok_1 tiger::pos::salt:/pcc2/11299/#tok_2
salt:/pcc2/11299/#tok_2 tiger::pos::salt:/pcc2/11299/#tok_3
salt:/pcc2/11299/#tok_3 tiger::pos::salt:/pcc2/11299/#tok_4
One line per match, each ID is separated by space. An ID can be prepended by the fully qualified annotation name (which is separated with '::' from the ID).
Parameters
segmentation
- Optional parameter for segmentation layer on which the context is applied. Leave empty for token layer (which is default).left
- Optional parameter for the left context size, default is 0.right
- Optional parameter for the right context size, default is 0.filter
- Optional parameter with value "all" or "token". If "token" only token will be fetched. Default is "all".
Responses
Code 200
Returns a representation of the Salt annotation graph in the EMF XMI format and with MIME type application/xml
or application/xmi+xml
.
Get the annotation graph of a complete document
Path(s)
GET
annis/query/graph/{top}/{doc}
{top} is the toplevel corpus name of the document and {doc} the document name.
Parameters
filternodeanno
- A comma seperated list of node annotations which are used as a filter for the graph. Only nodes having one of the annotations are included in the result.
Responses
Code 200
Returns a representation of the Salt annotation graph in the EMF XMI format and with MIME type application/xml
or application/xmi+xml
.
Get the content a binary object for a specific document
Path(s)
GET
annis/query/corpora/{top}/{document}/binaryGET
annis/query/corpora/{top}/{document}/binary/{offset}/{length}GET
annis/query/corpora/{top}/{document}/binary/{file}GET
annis/query/corpora/{top}/{document}/binary/{file}/{offset}/{length}
Accepts any MIME type. The MIME type is used as implicit argument to filter the files that match a given query.
There are several ways of selecting the binary data you want to receive. You can choose to select the file only by giving a document name given by the {top} and {document} arguments (paths 1 and 2). This will return the first file that also matches the requested accepted mime types. Alternatively the name of the file itself can be given as path argument {file} (paths 3 and 4). You can also choose to either get the complete file (paths 1 and 3) or chunks containing only a subset of the binary data (paths 2 and 4). In the latter case, you can specify the {offset} and the {length} of the chunk (both in bytes).
- {top} - The toplevel corpus name.
- {document} - The name of the document that has the file. If you want the files for the toplevel corpus itself, use the name of the toplevel corpus as document name.
- {file} - File name/title to select.
- {offset} - Defines the offset from the the binary chunk starts (in bytes).
- {length} - Defines the length of the binary chunk (in bytes).
Responses
Code 200
A binary stream that contains the file content. If path variant 2 and 4 is used only a subset of the file is returned. Path variant 1 and 3 always return the complete file.