Return-Path:
LuceneCoreQuery.dtd
: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMESCore Lucene
+Background
+This DTD describes the XML syntax used to perform advanced searches using the core Lucene search engine. The motivation behind the XML query syntax is:
+
+
+
+
CoreParser.java is the Java class that encapsulates this parser behaviour.
+<BooleanQuery> | +Child of Query, Clause, CachedFilter + |
BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. +Some clauses may represent optional Query criteria while others represent mandatory criteria.
Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo" +
+ <BooleanQuery fieldName="contents"> + <Clause occurs="should"> + <TermQuery>merger</TermQuery> + </Clause> + <Clause occurs="mustnot"> + <TermQuery>sumitomo</TermQuery> + </Clause> + <Clause occurs="must"> + <TermQuery>bank</TermQuery> + </Clause> + </BooleanQuery> + +
+Element's model:
+ + +
+ <BooleanQuery>'s children + + Name Cardinality + + + Clause At least one
+ +<BooleanQuery>'s attributes ++ + +Name Values Default + boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
(Clause)+
++@boost | +Attribute of BooleanQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++@fieldName | +Attribute of BooleanQuery + |
fieldName can optionally be defined here as a default attribute used by all child elements
++@disableCoord | +Attribute of BooleanQuery + |
The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
++@minimumNumberShouldMatch | +Attribute of BooleanQuery + |
The minimum number of optional clauses that should be present in any one document before it is considered to be a match.
Default value: 0
++<Clause> | +Child of BooleanQuery + |
NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be +child elements - while in a <BooleanFilter> clause only "filter" types can be contained.
+Element's model:
+ + +
+ <Clause>'s children + + Name Cardinality + + + BooleanQuery One or none + CachedFilter One or none + ConstantScoreQuery One or none + FilteredQuery One or none + MatchAllDocsQuery One or none + RangeFilter One or none + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none + TermQuery One or none + TermsQuery One or none + UserQuery One or none
+ +<Clause>'s attributes ++ + +Name Values Default + occurs should, must, mustnot should
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)
++@occurs | +Attribute of Clause + |
Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)
Possible values: should, must, mustnot - Default value: should
++<CachedFilter> | +Child of ConstantScoreQuery, Clause, Filter + |
Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into +Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. +Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just +a Boolean yes/no record of which documents matched.
Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in +RAM to eliminate the cost of building this filter from disk for every query +
+ <FilteredQuery> + <Query> + <UserQuery>bank</UserQuery> + </Query> + <Filter> + <CachedFilter> + <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> + </CachedFilter> + </Filter> + </FilteredQuery> +
+Element's model:
+ + +
+ <CachedFilter>'s children + + Name Cardinality + + + BooleanQuery One or none + CachedFilter One or none + ConstantScoreQuery One or none + FilteredQuery One or none + MatchAllDocsQuery One or none + RangeFilter One or none + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none + TermQuery One or none + TermsQuery One or none + UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)
++<UserQuery> | +Child of Query, Clause, CachedFilter + |
Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"
Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax +
+ <UserQuery>"John Smith" OR "John Doe"</UserQuery> +
+ +
+
+ +<UserQuery>'s attributes ++ + +Name Values Default + boost 1.0
+@boost | +Attribute of UserQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++<MatchAllDocsQuery/> | +Child of Query, Clause, CachedFilter + |
A query which is used to match all documents. This has a couple of uses: +
Example: Effectively use a Filter as a query +
+ <FilteredQuery> + <Query> + <MatchAllDocsQuery/> + </Query> + <Filter> + <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/> + </Filter> + </FilteredQuery> +
This element is always empty.
++<TermQuery> | +Child of Query, Clause, CachedFilter + |
a single term query - no analysis is done of the child text
Example: Match on a primary key +
+ <TermQuery fieldName="primaryKey">13424</TermQuery> +
+ +
+
+ +<TermQuery>'s attributes ++ + +Name Values Default + boost 1.0 fieldName
+@boost | +Attribute of TermQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++@fieldName | +Attribute of TermQuery + |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
++<TermsQuery> | +Child of Query, Clause, CachedFilter + |
The equivalent of a BooleanQuery with multiple optional TermQuery clauses. +Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. +Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable +of producing a Query parse error given any user input
Example: Match on text from a database description (which may contain characters that +are illegal characters in the standard Lucene Query syntax used in the UserQuery tag +
+ <TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery> +
+ +
+
+ +<TermsQuery>'s attributes ++ + +Name Values Default + boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
+@boost | +Attribute of TermsQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++@fieldName | +Attribute of TermsQuery + |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
++@disableCoord | +Attribute of TermsQuery + |
The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
++@minimumNumberShouldMatch | +Attribute of TermsQuery + |
The minimum number of terms that should be present in any one document before it is considered to be a match.
Default value: 0
++<FilteredQuery> | +Child of Query, Clause, CachedFilter + |
Runs a Query and filters results to only those query matches that also match the Filter element.
Example: Find all documents about Lucene that have a status of "published" +
+ <FilteredQuery> + <Query> + <UserQuery>Lucene</UserQuery> + </Query> + <Filter> + <TermsFilter fieldName="status">published</TermsFilter> + </Filter> + </FilteredQuery> +
+Element's model: +
+ + +
+ <FilteredQuery>'s children + + Name Cardinality + + + Filter Only one + Query Only one
+ +<FilteredQuery>'s attributes ++ + +Name Values Default + boost 1.0
+@boost | +Attribute of FilteredQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++<Query> | +Child of FilteredQuery + |
Used to identify a nested Query element inside another container element. NOT a top-level query tag
+Element's model:
+ + +
+ <Query>'s children + + Name Cardinality + + + BooleanQuery One or none + ConstantScoreQuery One or none + FilteredQuery One or none + MatchAllDocsQuery One or none + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none + TermQuery One or none + TermsQuery One or none + UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
++<Filter> | +Child of FilteredQuery + |
The choice of Filter that MUST also be matched
+Element's model: +
+ + +
+ <Filter>'s children + + Name Cardinality + + + CachedFilter One or none + RangeFilter One or none
+<RangeFilter/> | +Child of ConstantScoreQuery, Clause, CachedFilter, Filter + |
Filter used to limit query results to documents matching a range of field values
Example: Search for documents about banks from the last 10 years +
+ <FilteredQuery> + <Query> + <UserQuery>bank</UserQuery> + </Query> + <Filter> + <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> + </Filter> + </FilteredQuery> +
++
+ +<RangeFilter>'s attributes ++ + +Name Values Default + fieldName includeLower true, false true includeUpper true, false true lowerTerm upperTerm
This element is always empty.
++@fieldName | +Attribute of RangeFilter + |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
++@lowerTerm | +Attribute of RangeFilter + |
The lower-most term value for this field (must be <= upperTerm)
Required
++@upperTerm | +Attribute of RangeFilter + |
The upper-most term value for this field (must be >= lowerTerm)
Required
++@includeLower | +Attribute of RangeFilter + |
Controls if the lowerTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
++@includeUpper | +Attribute of RangeFilter + |
Controls if the upperTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
++<SpanTerm> | +Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter + |
A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity
Example: Find documents using terms close to each other about mining and accidents +
+ <SpanNear slop="8" inOrder="false" fieldName="text"> + <SpanOr> + <SpanTerm>killed</SpanTerm> + <SpanTerm>died</SpanTerm> + <SpanTerm>dead</SpanTerm> + </SpanOr> + <SpanOr> + <SpanTerm>miner</SpanTerm> + <SpanTerm>mining</SpanTerm> + <SpanTerm>miners</SpanTerm> + </SpanOr> + </SpanNear> +
+ +
+
+ +<SpanTerm>'s attributes ++ + +Name Values Default + fieldName
+@fieldName | +Attribute of SpanTerm + |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
++<SpanOrTerms> | +Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter + |
A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic
Example: Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports +using words describing a fatality near to references to miners +
+ <SpanNear slop="8" inOrder="false" fieldName="text"> + <SpanOrTerms>killed died death dead deaths</SpanOrTerms> + <SpanOrTerms>miner mining miners</SpanOrTerms> + </SpanNear> +
+ +
+
+ +<SpanOrTerms>'s attributes ++ + +Name Values Default + fieldName
+@fieldName | +Attribute of SpanOrTerms + |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
++<SpanOr> | +Child of SpanNear, Include, Query, Clause, SpanFirst, Exclude, CachedFilter + |
Takes any number of child queries from the Span family
Example: Find documents using terms close to each other about mining and accidents +
+ <SpanNear slop="8" inOrder="false" fieldName="text"> + <SpanOr> + <SpanTerm>killed</SpanTerm> + <SpanTerm>died</SpanTerm> + <SpanTerm>dead</SpanTerm> + </SpanOr> + <SpanOr> + <SpanTerm>miner</SpanTerm> + <SpanTerm>mining</SpanTerm> + <SpanTerm>miners</SpanTerm> + </SpanOr> + </SpanNear> +
+Element's model:
+ + +
+ <SpanOr>'s children + + Name Cardinality + + + SpanFirst Any number + SpanNear Any number + SpanNot Any number + SpanOr Any number + SpanOrTerms Any number + SpanTerm Any number
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*
++<SpanNear> | +Child of Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter + |
Takes any number of child queries from the Span family and tests for proximity
+Element's model:
+ + +
+ <SpanNear>'s children + + Name Cardinality + + + SpanFirst Any number + SpanNear Any number + SpanNot Any number + SpanOr Any number + SpanOrTerms Any number + SpanTerm Any number
+ +<SpanNear>'s attributes ++ + +Name Values Default + inOrder true, false true slop
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*
++@slop | +Attribute of SpanNear + |
defines the maximum distance between Span elements where distance is expressed as word number, not byte offset
Example: Find documents using terms within 8 words of each other talking about mining and accidents +
+ <SpanNear slop="8" inOrder="false" fieldName="text"> + <SpanOr> + <SpanTerm>killed</SpanTerm> + <SpanTerm>died</SpanTerm> + <SpanTerm>dead</SpanTerm> + </SpanOr> + <SpanOr> + <SpanTerm>miner</SpanTerm> + <SpanTerm>mining</SpanTerm> + <SpanTerm>miners</SpanTerm> + </SpanOr> + </SpanNear> +
Required
++@inOrder | +Attribute of SpanNear + |
Controls if matching terms have to appear in the order listed or can be reversed
Possible values: true, false - Default value: true
++<SpanFirst> | +Child of SpanNear, Include, Query, Clause, SpanOr, Exclude, CachedFilter + |
Looks for a SpanQuery match occuring near the beginning of a document
Example: Find letters where the first 50 words talk about a resignation: +
+ <SpanFirst end="50"> + <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms> + </SpanFirst> +
+Element's model:
+ + +
+ <SpanFirst>'s children + + Name Cardinality + + + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none
+ +<SpanFirst>'s attributes ++ + +Name Values Default + boost 1.0 end
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
++@end | +Attribute of SpanFirst + |
Controls the end of the region considered in a document's field (expressed in word number, not byte offset)
Required
++@boost | +Attribute of SpanFirst + |
Optional boost for matches on this query. Values > 1
Default value: 1.0
++<SpanNot> | +Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter + |
Finds documents matching a SpanQuery but not if matching another SpanQuery
Example: Find documents talking about social services but not containing the word "public" +
+ <SpanNot fieldName="text"> + <Include> + <SpanNear slop="2" inOrder="true"> + <SpanTerm>social</SpanTerm> + <SpanTerm>services</SpanTerm> + </SpanNear> + </Include> + <Exclude> + <SpanTerm>public</SpanTerm> + </Exclude> + </SpanNot> +
+Element's model: +
+ + +
+ <SpanNot>'s children + + Name Cardinality + + + Exclude Only one + Include Only one
+<Include> | +Child of SpanNot + |
The SpanQuery to find
+Element's model:
+ + +
+ <Include>'s children + + Name Cardinality + + + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
++<Exclude> | +Child of SpanNot + |
The SpanQuery to be avoided
+Element's model:
+ + +
+ <Exclude>'s children + + Name Cardinality + + + SpanFirst One or none + SpanNear One or none + SpanNot One or none + SpanOr One or none + SpanOrTerms One or none + SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
++<ConstantScoreQuery> | +Child of Query, Clause, CachedFilter + |
a utility tag to wrap any filter as a query
Example: Find all documents from the last 10 years +
+ <ConstantScoreQuery> + <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> + </ConstantScoreQuery> +
+Element's model:
+ + +
+ <ConstantScoreQuery>'s children + + Name Cardinality + + + CachedFilter Any number + RangeFilter Any number
+ +<ConstantScoreQuery>'s attributes ++ + +Name Values Default + boost 1.0
(RangeFilter | CachedFilter)*
++@boost | +Attribute of ConstantScoreQuery + |
Optional boost for matches on this query. Values > 1
Default value: 1.0