Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 39373 invoked from network); 22 Jun 2009 22:19:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jun 2009 22:19:41 -0000 Received: (qmail 45597 invoked by uid 500); 22 Jun 2009 22:19:52 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 45553 invoked by uid 500); 22 Jun 2009 22:19:52 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 45544 invoked by uid 99); 22 Jun 2009 22:19:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 22:19:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 22:19:48 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 29C272388907; Mon, 22 Jun 2009 22:19:06 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r787424 [6/11] - in /lucene/java/trunk: ./ contrib/ contrib/analyzers/ contrib/analyzers/src/java/org/apache/lucene/analysis/br/ contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/ contrib/analyzers/src/java/org/apache/lucene/analysi... Date: Mon, 22 Jun 2009 22:19:02 -0000 To: java-commits@lucene.apache.org From: mikemccand@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20090622221906.29C272388907@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.html?rev=787424&r1=787423&r2=787424&view=diff ============================================================================== --- lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.html (original) +++ lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.html Mon Jun 22 22:18:56 2009 @@ -1,719 +1,719 @@ - - - - -Core Lucene - -

LuceneCoreQuery.dtd: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMES

Core Lucene

-

Background

-This DTD describes the XML syntax used to perform advanced searches using the core Lucene search engine. The motivation behind the XML query syntax is: -
    -
  1. To open up Lucene functionality to clients other than Java
  2. -
  3. To offer a form of expressing queries that can easily be -
      -
    • Persisted for logging/auditing purposes
    • -
    • Changed by editing text query templates (XSLT) without requiring a recompile/redeploy of applications
    • -
    • Serialized across networks (without requiring Java bytecode for Query logic deployed on clients)
    • -
    -
  4. -
  5. To provide a shorthand way of expressing query logic which echos the logical tree structure of query objects more closely than reading procedural Java query construction code
  6. -
  7. To bridge the growing gap between Lucene query/filtering functionality and the set of functionality accessible throught the standard Lucene QueryParser syntax
  8. -
  9. To provide a simply extensible syntax that does not require complex parser skills such as knowledge of JavaCC syntax
  10. -

Syntax overview

-Search syntax consists of two types of elements: -
    -
  • Queries
  • -
  • Filters
  • -

Queries

-The root of any XML search must be a Query type element used to select content. -Queries typically score matches on documents using a number of different factors in order to provide relevant results first. -One common example of a query tag is the UserQuery element which uses the standard -Lucene QueryParser to parse Google-style search syntax provided by end users.

Filters

-Unlike Queries, Filters are not used to select or score content - they are simply used to filter Query output (see FilteredQuery for an example use of query filtering). -Because Filters simply offer a yes/no decision for each document in the index their output can be efficiently cached in memory as a Bitset for -subsequent reuse (see CachedFilter tag).

Nesting elements

-Many of the the elements can nest other elements to produce queries/filters of an arbitrary depth and complexity. -The BooleanQuery element is one such example which provides a means for combining other queries (including other BooleanQueries) using Boolean -logic to determine mandatory or optional elements.

Advanced topics

-

Advanced positional testing - span queries

-The SpanQuery class of queries allow for complex positional tests which not only look for certain combinations of words but in particular -positions in relation to each other and the documents containing them.

CoreParser.java is the Java class that encapsulates this parser behaviour.


- -
-<BooleanQuery> -Child of Clause, CachedFilter, Query -
-

BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. -Some clauses may represent optional Query criteria while others represent mandatory criteria.

Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo" -

	          
-            <BooleanQuery fieldName="contents">
-	             <Clause occurs="should">
-		              <TermQuery>merger</TermQuery>
-	             </Clause>
-	             <Clause occurs="mustnot">
-		              <TermQuery>sumitomo</TermQuery>
-	             </Clause>
-	             <Clause occurs="must">
-		              <TermQuery>bank</TermQuery>
-	             </Clause>
-            </BooleanQuery>
-
-	         

-
- - - - - - - -
<BooleanQuery>'s children
NameCardinality
ClauseAt least one
- - - - - - -
<BooleanQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0
-Element's model:

(Clause)+

-
-@boost -Attribute of BooleanQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-@fieldName -Attribute of BooleanQuery -
-

fieldName can optionally be defined here as a default attribute used by all child elements

-
-@disableCoord -Attribute of BooleanQuery -
-

The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false

-
-@minimumNumberShouldMatch -Attribute of BooleanQuery -
-

The minimum number of optional clauses that should be present in any one document before it is considered to be a match.

Default value: 0

-
-<Clause> -Child of BooleanQuery -
-

NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be -child elements - while in a <BooleanFilter> clause only "filter" types can be contained.

-
- - - - - - - - - - - - - - - - - - - - - -
<Clause>'s children
NameCardinality
BooleanQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
- - - - - - -
<Clause>'s attributes
NameValuesDefault
occursshould, must, mustnotshould
-Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)

-
-@occurs -Attribute of Clause -
-

Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)

Possible values: should, must, mustnot - Default value: should

-
-<CachedFilter> -Child of ConstantScoreQuery, Clause, Filter -
-

Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into -Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. -Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just -a Boolean yes/no record of which documents matched.

Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in -RAM to eliminate the cost of building this filter from disk for every query -

	          
-            <FilteredQuery>
-               <Query>
-                  <UserQuery>bank</UserQuery>
-               </Query>	
-               <Filter>
-                  <CachedFilter>
-                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
-                  </CachedFilter>
-               </Filter>	
-            </FilteredQuery>
-	         

-
- - - - - - - - - - - - - - - - - - - - - -
<CachedFilter>'s children
NameCardinality
BooleanQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
-Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)

-
-<UserQuery> -Child of Clause, CachedFilter, Query -
-

Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"

Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax -

	          
-               <UserQuery>"John Smith" OR "John Doe"</UserQuery>
-	         

-
- - - - - - -
<UserQuery>'s attributes
NameValuesDefault
boost1.0
fieldName
- -
-@boost -Attribute of UserQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-@fieldName -Attribute of UserQuery -
-

fieldName can optionally be defined here to change the default field used in the QueryParser

-
-<MatchAllDocsQuery/> -Child of Clause, CachedFilter, Query -
-

A query which is used to match all documents. This has a couple of uses: -

    -
  1. as a Clause in a BooleanQuery who's only other clause -is a "mustNot" match (Lucene requires at least one positive clause) and..
  2. -
  3. in a FilteredQuery where a Filter tag is effectively being -used to select content rather than it's usual role of filtering the results of a query.
  4. -

Example: Effectively use a Filter as a query -

	          
-               <FilteredQuery>
-                 <Query>
-                    <MatchAllDocsQuery/>
-                 </Query>
-                 <Filter>
-                     <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/>
-                 </Filter>	
-               </FilteredQuery>	         
-	       

This element is always empty.

-
-<TermQuery> -Child of Clause, CachedFilter, Query -
-

a single term query - no analysis is done of the child text

Example: Match on a primary key -

	          
-               <TermQuery fieldName="primaryKey">13424</TermQuery>
-	       

-
- - - - - - -
<TermQuery>'s attributes
NameValuesDefault
boost1.0
fieldName
- -
-@boost -Attribute of TermQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-@fieldName -Attribute of TermQuery -
-

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

-
-<TermsQuery> -Child of Clause, CachedFilter, Query -
-

The equivalent of a BooleanQuery with multiple optional TermQuery clauses. -Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. -Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable -of producing a Query parse error given any user input

Example: Match on text from a database description (which may contain characters that -are illegal characters in the standard Lucene Query syntax used in the UserQuery tag -

	          
-               <TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
-	       

-
- - - - - - -
<TermsQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0
- -
-@boost -Attribute of TermsQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-@fieldName -Attribute of TermsQuery -
-

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

-
-@disableCoord -Attribute of TermsQuery -
-

The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false

-
-@minimumNumberShouldMatch -Attribute of TermsQuery -
-

The minimum number of terms that should be present in any one document before it is considered to be a match.

Default value: 0

-
-<FilteredQuery> -Child of Clause, CachedFilter, Query -
-

Runs a Query and filters results to only those query matches that also match the Filter element.

Example: Find all documents about Lucene that have a status of "published" -

	          
-               <FilteredQuery>
-                 <Query>
-                    <UserQuery>Lucene</UserQuery>
-                 </Query>
-                 <Filter>
-                     <TermsFilter fieldName="status">published</TermsFilter>
-                 </Filter>	
-               </FilteredQuery>	         
-	       

-
- - - - - - - - -
<FilteredQuery>'s children
NameCardinality
FilterOnly one
QueryOnly one
- - - - - - -
<FilteredQuery>'s attributes
NameValuesDefault
boost1.0
-Element's model:

(Query, Filter)

-
-@boost -Attribute of FilteredQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-<Query> -Child of FilteredQuery -
-

Used to identify a nested Query element inside another container element. NOT a top-level query tag

-
- - - - - - - - - - - - - - - - - - - -
<Query>'s children
NameCardinality
BooleanQueryOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
-Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)

-
-<Filter> -Child of FilteredQuery -
-

The choice of Filter that MUST also be matched

-
- - - - - - - - -
<Filter>'s children
NameCardinality
CachedFilterOne or none
RangeFilterOne or none
-Element's model:

(RangeFilter | CachedFilter)

-
-<RangeFilter/> -Child of ConstantScoreQuery, Clause, CachedFilter, Filter -
-

Filter used to limit query results to documents matching a range of field values

Example: Search for documents about banks from the last 10 years -

	          
-            <FilteredQuery>
-               <Query>
-                  <UserQuery>bank</UserQuery>
-               </Query>	
-               <Filter>
-                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
-               </Filter>	
-            </FilteredQuery>
-	         

- - - - - - - -
<RangeFilter>'s attributes
NameValuesDefault
fieldName
includeLowertrue, falsetrue
includeUppertrue, falsetrue
lowerTerm
upperTerm
-

This element is always empty.

-
-@fieldName -Attribute of RangeFilter -
-

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

-
-@lowerTerm -Attribute of RangeFilter -
-

The lower-most term value for this field (must be <= upperTerm)

Required

-
-@upperTerm -Attribute of RangeFilter -
-

The upper-most term value for this field (must be >= lowerTerm)

Required

-
-@includeLower -Attribute of RangeFilter -
-

Controls if the lowerTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true

-
-@includeUpper -Attribute of RangeFilter -
-

Controls if the upperTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true

-
-<SpanTerm> -Child of SpanOr, SpanFirst, Exclude, Clause, Include, CachedFilter, SpanNear, Query -
-

A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity

Example: Find documents using terms close to each other about mining and accidents -

-	      <SpanNear slop="8" inOrder="false" fieldName="text">		
-			<SpanOr>
-				<SpanTerm>killed</SpanTerm>
-				<SpanTerm>died</SpanTerm>
-				<SpanTerm>dead</SpanTerm>
-			</SpanOr>
-			<SpanOr>
-				<SpanTerm>miner</SpanTerm>
-				<SpanTerm>mining</SpanTerm>
-				<SpanTerm>miners</SpanTerm>
-			</SpanOr>
-	      </SpanNear>
-	      

-
- - - - - - -
<SpanTerm>'s attributes
NameValuesDefault
fieldName
- -
-@fieldName -Attribute of SpanTerm -
-

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

Required

-
-<SpanOrTerms> -Child of SpanOr, SpanFirst, Exclude, Clause, Include, CachedFilter, SpanNear, Query -
-

A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic

Example: Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports -using words describing a fatality near to references to miners -

-	      <SpanNear slop="8" inOrder="false" fieldName="text">		
-			<SpanOrTerms>killed died death dead deaths</SpanOrTerms>
-			<SpanOrTerms>miner mining miners</SpanOrTerms>
-	      </SpanNear>
-	      

-
- - - - - - -
<SpanOrTerms>'s attributes
NameValuesDefault
fieldName
- -
-@fieldName -Attribute of SpanOrTerms -
-

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

Required

-
-<SpanOr> -Child of SpanFirst, Exclude, Clause, Include, CachedFilter, SpanNear, Query -
-

Takes any number of child queries from the Span family

Example: Find documents using terms close to each other about mining and accidents -

-	      <SpanNear slop="8" inOrder="false" fieldName="text">		
-			<SpanOr>
-				<SpanTerm>killed</SpanTerm>
-				<SpanTerm>died</SpanTerm>
-				<SpanTerm>dead</SpanTerm>
-			</SpanOr>
-			<SpanOr>
-				<SpanTerm>miner</SpanTerm>
-				<SpanTerm>mining</SpanTerm>
-				<SpanTerm>miners</SpanTerm>
-			</SpanOr>
-	      </SpanNear>
-	      

-
- - - - - - - - - - - - -
<SpanOr>'s children
NameCardinality
SpanFirstAny number
SpanNearAny number
SpanNotAny number
SpanOrAny number
SpanOrTermsAny number
SpanTermAny number
-Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*

-
-<SpanNear> -Child of SpanOr, SpanFirst, Exclude, Clause, Include, CachedFilter, Query -
-

Takes any number of child queries from the Span family and tests for proximity

-
- - - - - - - - - - - - -
<SpanNear>'s children
NameCardinality
SpanFirstAny number
SpanNearAny number
SpanNotAny number
SpanOrAny number
SpanOrTermsAny number
SpanTermAny number
- - - - - - -
<SpanNear>'s attributes
NameValuesDefault
inOrdertrue, falsetrue
slop
-Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*

-
-@slop -Attribute of SpanNear -
-

defines the maximum distance between Span elements where distance is expressed as word number, not byte offset

Example: Find documents using terms within 8 words of each other talking about mining and accidents -

-	      <SpanNear slop="8" inOrder="false" fieldName="text">		
-			<SpanOr>
-				<SpanTerm>killed</SpanTerm>
-				<SpanTerm>died</SpanTerm>
-				<SpanTerm>dead</SpanTerm>
-			</SpanOr>
-			<SpanOr>
-				<SpanTerm>miner</SpanTerm>
-				<SpanTerm>mining</SpanTerm>
-				<SpanTerm>miners</SpanTerm>
-			</SpanOr>
-	      </SpanNear>
-	      

Required

-
-@inOrder -Attribute of SpanNear -
-

Controls if matching terms have to appear in the order listed or can be reversed

Possible values: true, false - Default value: true

-
-<SpanFirst> -Child of SpanOr, Exclude, Clause, Include, CachedFilter, SpanNear, Query -
-

Looks for a SpanQuery match occuring near the beginning of a document

Example: Find letters where the first 50 words talk about a resignation: -

	          
-	         <SpanFirst end="50">
-	               <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms>
-	         </SpanFirst>
-	         

-
- - - - - - - - - - - - -
<SpanFirst>'s children
NameCardinality
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
- - - - - - -
<SpanFirst>'s attributes
NameValuesDefault
boost1.0
end
-Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)

-
-@end -Attribute of SpanFirst -
-

Controls the end of the region considered in a document's field (expressed in word number, not byte offset)

Required

-
-@boost -Attribute of SpanFirst -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

-
-<SpanNot> -Child of SpanOr, SpanFirst, Exclude, Clause, Include, CachedFilter, SpanNear, Query -
-

Finds documents matching a SpanQuery but not if matching another SpanQuery

Example: Find documents talking about social services but not containing the word "public" -

-          <SpanNot fieldName="text">
-             <Include>
-                <SpanNear slop="2" inOrder="true">		
-                     <SpanTerm>social</SpanTerm>
-                     <SpanTerm>services</SpanTerm>
-                </SpanNear>				
-             </Include>
-             <Exclude>
-                <SpanTerm>public</SpanTerm>
-             </Exclude>
-          </SpanNot>
-	      

-
- - - - - - - - -
<SpanNot>'s children
NameCardinality
ExcludeOnly one
IncludeOnly one
-Element's model:

(Include, Exclude)

-
-<Include> -Child of SpanNot -
-

The SpanQuery to find

-
- - - - - - - - - - - - -
<Include>'s children
NameCardinality
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
-Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)

-
-<Exclude> -Child of SpanNot -
-

The SpanQuery to be avoided

-
- - - - - - - - - - - - -
<Exclude>'s children
NameCardinality
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
-Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)

-
-<ConstantScoreQuery> -Child of Clause, CachedFilter, Query -
-

a utility tag to wrap any filter as a query

Example: Find all documents from the last 10 years -

-     <ConstantScoreQuery>
-           <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
-     </ConstantScoreQuery>	
-	

-
- - - - - - - - -
<ConstantScoreQuery>'s children
NameCardinality
CachedFilterAny number
RangeFilterAny number
- - - - - - -
<ConstantScoreQuery>'s attributes
NameValuesDefault
boost1.0
-Element's model:

(RangeFilter | CachedFilter)*

-
-@boost -Attribute of ConstantScoreQuery -
-

Optional boost for matches on this query. Values > 1

Default value: 1.0

+ + + + +Core Lucene + +

LuceneCoreQuery.dtd: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMES

Core Lucene

+

Background

+This DTD describes the XML syntax used to perform advanced searches using the core Lucene search engine. The motivation behind the XML query syntax is: +
    +
  1. To open up Lucene functionality to clients other than Java
  2. +
  3. To offer a form of expressing queries that can easily be +
      +
    • Persisted for logging/auditing purposes
    • +
    • Changed by editing text query templates (XSLT) without requiring a recompile/redeploy of applications
    • +
    • Serialized across networks (without requiring Java bytecode for Query logic deployed on clients)
    • +
    +
  4. +
  5. To provide a shorthand way of expressing query logic which echos the logical tree structure of query objects more closely than reading procedural Java query construction code
  6. +
  7. To bridge the growing gap between Lucene query/filtering functionality and the set of functionality accessible throught the standard Lucene QueryParser syntax
  8. +
  9. To provide a simply extensible syntax that does not require complex parser skills such as knowledge of JavaCC syntax
  10. +

Syntax overview

+Search syntax consists of two types of elements: +
    +
  • Queries
  • +
  • Filters
  • +

Queries

+The root of any XML search must be a Query type element used to select content. +Queries typically score matches on documents using a number of different factors in order to provide relevant results first. +One common example of a query tag is the UserQuery element which uses the standard +Lucene QueryParser to parse Google-style search syntax provided by end users.

Filters

+Unlike Queries, Filters are not used to select or score content - they are simply used to filter Query output (see FilteredQuery for an example use of query filtering). +Because Filters simply offer a yes/no decision for each document in the index their output can be efficiently cached in memory as a Bitset for +subsequent reuse (see CachedFilter tag).

Nesting elements

+Many of the the elements can nest other elements to produce queries/filters of an arbitrary depth and complexity. +The BooleanQuery element is one such example which provides a means for combining other queries (including other BooleanQueries) using Boolean +logic to determine mandatory or optional elements.

Advanced topics

+

Advanced positional testing - span queries

+The SpanQuery class of queries allow for complex positional tests which not only look for certain combinations of words but in particular +positions in relation to each other and the documents containing them.

CoreParser.java is the Java class that encapsulates this parser behaviour.


+ +
+<BooleanQuery> +Child of Clause, CachedFilter, Query +
+

BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. +Some clauses may represent optional Query criteria while others represent mandatory criteria.

Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo" +

	          
+            <BooleanQuery fieldName="contents">
+	             <Clause occurs="should">
+		              <TermQuery>merger</TermQuery>
+	             </Clause>
+	             <Clause occurs="mustnot">
+		              <TermQuery>sumitomo</TermQuery>
+	             </Clause>
+	             <Clause occurs="must">
+		              <TermQuery>bank</TermQuery>
+	             </Clause>
+            </BooleanQuery>
+
+	         

+
+ + + + + + + +
<BooleanQuery>'s children
NameCardinality
ClauseAt least one
+ + + + + + +
<BooleanQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0
+Element's model:

(Clause)+

+
+@boost +Attribute of BooleanQuery +
+

Optional boost for matches on this query. Values > 1

Default value: 1.0

+
+@fieldName +Attribute of BooleanQuery +
+

fieldName can optionally be defined here as a default attribute used by all child elements

+
+@disableCoord +Attribute of BooleanQuery +
+

The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false

+
+@minimumNumberShouldMatch +Attribute of BooleanQuery +
+

The minimum number of optional clauses that should be present in any one document before it is considered to be a match.

Default value: 0

+
+<Clause> +Child of BooleanQuery +
+

NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be +child elements - while in a <BooleanFilter> clause only "filter" types can be contained.

+
+ + + + + + + + + + + + + + + + + + + + + +
<Clause>'s children
NameCardinality
BooleanQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
+ + + + + + +
<Clause>'s attributes
NameValuesDefault
occursshould, must, mustnotshould
+Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)

+
+@occurs +Attribute of Clause +
+

Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)

Possible values: should, must, mustnot - Default value: should

+
+<CachedFilter> +Child of ConstantScoreQuery, Clause, Filter +
+

Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into +Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. +Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just +a Boolean yes/no record of which documents matched.

Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in +RAM to eliminate the cost of building this filter from disk for every query +

	          
+            <FilteredQuery>
+               <Query>
+                  <UserQuery>bank</UserQuery>
+               </Query>	
+               <Filter>
+                  <CachedFilter>
+                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
+                  </CachedFilter>
+               </Filter>	
+            </FilteredQuery>
+	         

+
+ + + + + + + + + + + + + + + + + + + + + +
<CachedFilter>'s children
NameCardinality
BooleanQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
+Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)

+
+<UserQuery> +Child of Clause, CachedFilter, Query +
+

Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"

Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax +

	          
+               <UserQuery>"John Smith" OR "John Doe"</UserQuery>
+	         

+
+ + + + + + +
<UserQuery>'s attributes
NameValuesDefault
boost1.0
fieldName
+ +
+@boost +Attribute of UserQuery +
+

Optional boost for matches on this query. Values > 1

Default value: 1.0

+
+@fieldName +Attribute of UserQuery +
+

fieldName can optionally be defined here to change the default field used in the QueryParser

+
+<MatchAllDocsQuery/> +Child of Clause, CachedFilter, Query +
+

A query which is used to match all documents. This has a couple of uses: +

    +
  1. as a Clause in a BooleanQuery who's only other clause +is a "mustNot" match (Lucene requires at least one positive clause) and..
  2. +
  3. in a FilteredQuery where a Filter tag is effectively being +used to select content rather than it's usual role of filtering the results of a query.
  4. +

Example: Effectively use a Filter as a query +

	          
+               <FilteredQuery>
+                 <Query>
+                    <MatchAllDocsQuery/>
+                 </Query>
+                 <Filter>
+                     <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/>
+                 </Filter>	
+               </FilteredQuery>	         
+	       

This element is always empty.

+
+<TermQuery> +Child of Clause, CachedFilter, Query +
+

a single term query - no analysis is done of the child text

Example: Match on a primary key +

	          
+               <TermQuery fieldName="primaryKey">13424</TermQuery>
+	       

+
+ + + + + + +
<TermQuery>'s attributes
NameValuesDefault
boost1.0
fieldName
+ +
+@boost +Attribute of TermQuery +
+

Optional boost for matches on this query. Values > 1

Default value: 1.0

+
+@fieldName +Attribute of TermQuery +
+

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

+
+<TermsQuery> +Child of Clause, CachedFilter, Query +
+

The equivalent of a BooleanQuery with multiple optional TermQuery clauses. +Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. +Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable +of producing a Query parse error given any user input

Example: Match on text from a database description (which may contain characters that +are illegal characters in the standard Lucene Query syntax used in the UserQuery tag +

	          
+               <TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
+	       

+
+ + + + + + +
<TermsQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0
+ +
+@boost +Attribute of TermsQuery +
+

Optional boost for matches on this query. Values > 1

Default value: 1.0

+
+@fieldName +Attribute of TermsQuery +
+

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

+
+@disableCoord +Attribute of TermsQuery +
+

The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false

+
+@minimumNumberShouldMatch +Attribute of TermsQuery +
+

The minimum number of terms that should be present in any one document before it is considered to be a match.

Default value: 0

+
+<FilteredQuery> +Child of Clause, CachedFilter, Query +
+

Runs a Query and filters results to only those query matches that also match the Filter element.

Example: Find all documents about Lucene that have a status of "published" +

	          
+               <FilteredQuery>
+                 <Query>
+                    <UserQuery>Lucene</UserQuery>
+                 </Query>
+                 <Filter>
+                     <TermsFilter fieldName="status">published</TermsFilter>
+                 </Filter>	
+               </FilteredQuery>	         
+	       

+
+ + + + + + + + +
<FilteredQuery>'s children
NameCardinality
FilterOnly one
QueryOnly one
+ + + + + + +
<FilteredQuery>'s attributes
NameValuesDefault
boost1.0
+Element's model:

(Query, Filter)

+
+@boost +Attribute of FilteredQuery +
+

Optional boost for matches on this query. Values > 1

Default value: 1.0

+
+<Query> +Child of FilteredQuery +
+

Used to identify a nested Query element inside another container element. NOT a top-level query tag

+
+ + + + + + + + + + + + + + + + + + + +
<Query>'s children
NameCardinality
BooleanQueryOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
MatchAllDocsQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
+Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)

+
+<Filter> +Child of FilteredQuery +
+

The choice of Filter that MUST also be matched

+
+ + + + + + + + +
<Filter>'s children
NameCardinality
CachedFilterOne or none
RangeFilterOne or none
+Element's model:

(RangeFilter | CachedFilter)

+
+<RangeFilter/> +Child of ConstantScoreQuery, Clause, CachedFilter, Filter +
+

Filter used to limit query results to documents matching a range of field values

Example: Search for documents about banks from the last 10 years +

	          
+            <FilteredQuery>
+               <Query>
+                  <UserQuery>bank</UserQuery>
+               </Query>	
+               <Filter>
+                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
+               </Filter>	
+            </FilteredQuery>
+	         

+ + + + + + + +
<RangeFilter>'s attributes
NameValuesDefault
fieldName
includeLowertrue, falsetrue
includeUppertrue, falsetrue
lowerTerm
upperTerm
+

This element is always empty.

+
+@fieldName +Attribute of RangeFilter +
+

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

+
+@lowerTerm +Attribute of RangeFilter +
+

The lower-most term value for this field (must be <= upperTerm)

Required

+
+@upperTerm +Attribute of RangeFilter +
+

The upper-most term value for this field (must be >= lowerTerm)

Required

+
+@includeLower +Attribute of RangeFilter +
+

Controls if the lowerTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true

+
+@includeUpper +Attribute of RangeFilter +
[... 290 lines stripped ...]