Searching for Word Forms

To search for word forms in ANNIS, simply select a corpus (in this example the freely available GUM corpus) and enter a search string between double quotation marks, e.g.:

"do"

Note that the search is case sensitive, so it will not find cases of capitalized 'Do', for example at the beginning of a sentence. In order to find both options, you can either look for one form OR the other using the pipe sign ( | ):

"do" | "Do"

or else you can use regular expressions, which must be surrounded by slashes ( / ) instead of quotation marks:

/[Dd]o/

To look for a sequence of multiple word forms, enter your search terms separated by & and then specify that the relation between the elements is one of precedence, as signified by the period ( . ) operator:

"do" & "n't" & #1 . #2

The expression #1 . #2 signifies that the first element ("do") precedes the second element ("n't"). Alternatively, you can also place the operator directly between the search elements as a shortcut. The following shortcut query is equivalent to the one above:

"do" . "n't"`

For indirect precedence (where other tokens may stand between the search terms), use the .* operator:

/[Dd]o/ & "n't" & "any" & #1 . #2 & #2 .* #3 

OR using shortcuts:

/[Dd]o/ . "n't" .* "any"

The queries above find sequences beginning with the token "Do" or "do", followed directly by "n't", which must be followed either directly or indirectly (.*) by "any". A range of allowed distances can also be specified numerically as follows:

/[Nn]ot/ & "all" & #1 .1,5 #2

OR:

/[Nn]ot/ .1,5 "all"

Meaning the two words "not" and "all" may appear at a distance of 1 to 5 tokens. The operator .* allows a distance of up to 50 tokens by default, so searching with .1,50 is the same as using .* instead. Greater distances (e.g. .1,100 for 'within 100 tokens') should always be specified explicitly.

Finally, we can add metadata restrictions to the query, which filter out documents not matching our definitions. Metadata attributes must be preceded by the prefix meta:: and may not be bound (i.e. they are not referred to as #1 etc. and the numbering of other elements ignores their existence):

"want" & "to" & #1 .1,5 #2 & meta::type="interview" 

To view metadata for a search result or for a corpus, press the "i" icon next to it in the result window or in the search form respectively.