Corpus queries

Interface defining the REST API calls that ANNIS provides for querying the data.

All paths for this part of the service start with the "annis/query/" prefix.

GET annis/query/search/count

q - The query in the ANNIS Query Language (AQL)
corpora - A comma separated list of corpus names

Produces an XML representation of the total matches and the number of documents that contain matches (application/xml):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<matchAndDocumentCount>
  <!-- the number of documents that contain matches -->
  <documentCount>2</documentCount>
  <!-- total number of matches -->
  <matchCount>399</matchCount>
</matchAndDocumentCount>

GET annis/query/search/find

q - The query in the ANNIS Query Language (AQL)
corpora - A comma separated list of corpus names
offset - Optional offset from where to start the matches. Default is 0.
limit - Optional limit of the number of returned matches. Set to -1 if unlimited. Default is -1.
order - Optional order how the results should be sorted. Can be either "normal", "random" or "inverted" "normal" is the default ordering, "inverted" inverses the default ordering and "random" is a non-stable (thus you will get different results for the same offset and limit) random ordering.

A list of the match identifiers for the query.

Can produce the MIME type application/xml in the following format

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<match-group>
  <!-- each match is enclosed in an match tag -->
  <match>
    <!-- the first matched node of match 1 did not match an annotation -->
    <anno></anno>
    <!-- the second matched node of match 1 was a match on the 'tiger::pos' annotation-->
    <anno>tiger::pos</anno>
    <!-- ID of first matched node of match 1 -->
    <id>salt:/pcc2/11299/#tok_1</id>
    <!-- ID of second matched noded  of match 1 -->
    <id>salt:/pcc2/11299/#tok_2</id>
  </match>
  <match>
    <anno></anno>
    <anno>tiger::pos</anno>
    <!-- ID of first matched noded of match 2 -->
    <id>salt:/pcc2/11299/#tok_2</id>
    <!-- ID of second matched noded of match 2-->
    <id>salt:/pcc2/11299/#tok_3</id>
  </match>
  <!-- and so on -->
</match-group>

or the MIME type text/plain

salt:/pcc2/11299/#tok_1 tiger::pos::salt:/pcc2/11299/#tok_2
salt:/pcc2/11299/#tok_2 tiger::pos::salt:/pcc2/11299/#tok_3
salt:/pcc2/11299/#tok_3 tiger::pos::salt:/pcc2/11299/#tok_4

In this format, there is one line per match and each ID is separated by space. An ID can be prefixed by the fully qualified annotation name (which is separated with '::' from the ID).

POST annis/query/search/subgraph

Request body

Consumes application/xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<match-group>
  <!-- each match is enclosed in an match tag -->
  <match>
    <!-- the first matched node of match 1 did not match an annotation -->
    <anno></anno>
    <!-- the second matched node of match 1 was a match on the 'tiger::pos' annotation-->
    <anno>tiger::pos</anno>
    <!-- ID of first matched node of match 1 -->
    <id>salt:/pcc2/11299/#tok_1</id>
    <!-- ID of second matched noded  of match 1 -->
    <id>salt:/pcc2/11299/#tok_2</id>
  </match>
  <match>
    <anno></anno>
    <anno>tiger::pos</anno>
    <!-- ID of first matched noded of match 2 -->
    <id>salt:/pcc2/11299/#tok_2</id>
    <!-- ID of second matched noded of match 2-->
    <id>salt:/pcc2/11299/#tok_3</id>
  </match>
  <!-- and so on -->
</match-group>

or consumes text/plain:

salt:/pcc2/11299/#tok_1 tiger::pos::salt:/pcc2/11299/#tok_2
salt:/pcc2/11299/#tok_2 tiger::pos::salt:/pcc2/11299/#tok_3
salt:/pcc2/11299/#tok_3 tiger::pos::salt:/pcc2/11299/#tok_4

One line per match, each ID is separated by space. An ID can be prepended by the fully qualified annotation name (which is separated with '::' from the ID).

segmentation - Optional parameter for segmentation layer on which the context is applied. Leave empty for token layer (which is default).
left - Optional parameter for the left context size, default is 0.
right - Optional parameter for the right context size, default is 0.
filter - Optional parameter with value "all" or "token". If "token" only token will be fetched. Default is "all".

Returns a representation of the Salt annotation graph in the EMF XMI format and with MIME type application/xml or application/xmi+xml.

GET annis/query/graph/{top}/{doc}

{top} is the toplevel corpus name of the document and {doc} the document name.

filternodeanno - A comma seperated list of node annotations which are used as a filter for the graph. Only nodes having one of the annotations are included in the result.

Returns a representation of the Salt annotation graph in the EMF XMI format and with MIME type application/xml or application/xmi+xml.

Get the content a binary object for a specific document

GET annis/query/corpora/{top}/{document}/binary
GET annis/query/corpora/{top}/{document}/binary/{offset}/{length}
GET annis/query/corpora/{top}/{document}/binary/{file}
GET annis/query/corpora/{top}/{document}/binary/{file}/{offset}/{length}

Accepts any MIME type. The MIME type is used as implicit argument to filter the files that match a given query.

There are several ways of selecting the binary data you want to receive. You can choose to select the file only by giving a document name given by the {top} and {document} arguments (paths 1 and 2). This will return the first file that also matches the requested accepted mime types. Alternatively the name of the file itself can be given as path argument {file} (paths 3 and 4). You can also choose to either get the complete file (paths 1 and 3) or chunks containing only a subset of the binary data (paths 2 and 4). In the latter case, you can specify the {offset} and the {length} of the chunk (both in bytes).

{top} - The toplevel corpus name.
{document} - The name of the document that has the file. If you want the files for the toplevel corpus itself, use the name of the toplevel corpus as document name.
{file} - File name/title to select.
{offset} - Defines the offset from the the binary chunk starts (in bytes).
{length} - Defines the length of the binary chunk (in bytes).

A binary stream that contains the file content. If path variant 2 and 4 is used only a subset of the file is returned. Path variant 1 and 3 always return the complete file.

ANNIS Developer Guide

Count matches of a query

Path(s)

Parameters

Responses

Code 200

Find matches for a given query

Path(s)

Parameters

Responses

Code 200

Get a subgraph from a set of (matched) Salt IDs

Path(s)

Request body

Parameters

Responses

Code 200

Get the annotation graph of a complete document

Path(s)

Parameters

Responses

Code 200

Get the content a binary object for a specific document

Path(s)

Responses

Code 200