lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mharw...@apache.org
Subject svn commit: r583307 [1/6] - in /lucene/java/trunk/contrib/xml-query-parser: ./ docs/ docs/img/
Date Tue, 09 Oct 2007 21:45:28 GMT
Author: mharwood
Date: Tue Oct  9 14:45:27 2007
New Revision: 583307

URL: http://svn.apache.org/viewvc?rev=583307&view=rev
Log:
Provided DTDs for core and contrib XML query syntax. The "docs" directory contains detailed
documentation generated by DTDdoc from the DTDs. The ant script used to generate these docs
is also included but not hooked up to the main build process due to license issues with DTDdoc.

Added:
    lucene/java/trunk/contrib/xml-query-parser/LuceneContribQuery.dtd
    lucene/java/trunk/contrib/xml-query-parser/LuceneCoreQuery.dtd
    lucene/java/trunk/contrib/xml-query-parser/docs/
    lucene/java/trunk/contrib/xml-query-parser/docs/DTDDocStyle.css
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.entities.html
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.html
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.org.html
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.entities.html
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.html
    lucene/java/trunk/contrib/xml-query-parser/docs/LuceneCoreQuery.dtd.org.html
    lucene/java/trunk/contrib/xml-query-parser/docs/cctree.js
    lucene/java/trunk/contrib/xml-query-parser/docs/dtreeStyle.css
    lucene/java/trunk/contrib/xml-query-parser/docs/elementsIndex.html
    lucene/java/trunk/contrib/xml-query-parser/docs/img/
    lucene/java/trunk/contrib/xml-query-parser/docs/img/empty.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/join.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/joinbottom.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/line.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/minus.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/minusbottom.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/plus.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/img/plusbottom.gif   (with props)
    lucene/java/trunk/contrib/xml-query-parser/docs/index.html
    lucene/java/trunk/contrib/xml-query-parser/docs/intro.html
    lucene/java/trunk/contrib/xml-query-parser/docs/toc.html
    lucene/java/trunk/contrib/xml-query-parser/dtddocbuild.xml
Modified:
    lucene/java/trunk/contrib/xml-query-parser/readme.htm

Added: lucene/java/trunk/contrib/xml-query-parser/LuceneContribQuery.dtd
URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/LuceneContribQuery.dtd?rev=583307&view=auto
==============================================================================
--- lucene/java/trunk/contrib/xml-query-parser/LuceneContribQuery.dtd (added)
+++ lucene/java/trunk/contrib/xml-query-parser/LuceneContribQuery.dtd Tue Oct  9 14:45:27
2007
@@ -0,0 +1,229 @@
+<!--	
+	This DTD builds on the <a href="LuceneCoreQuery.dtd.html">core Lucene XML syntax</a>
and adds support for features found in the "contrib" section of the Lucene project.
+	
+	CorePlusExtensionsParser.java is the Java class that encapsulates this parser behaviour.
+
+	
+	The features added are:
+	<ul>
+	<li><a href="#LikeThisQuery">LikeThisQuery</a></li>
+	   Support for querying using large amounts of example text indicative of the users' general
area of interest
+	<li><a href="#FuzzyLikeThisQuery">FuzzyLikeThisQuery</a></li>
+	   A style of fuzzy query which automatically looks for fuzzy variations on only the "interesting"
terms 
+	<li><a href="#BooleanFilter">BooleanFilter</a></li>
+	   Is to Filters what core Lucene's BooleanQuery is to Queries - allows mixing of clauses
using Boolean logic
+	<li><a href="#TermsFilter">TermsFilter</a></li>
+	   Constructs a filter from an arbitrary set of terms (unlike <a href="#RangeFilter">RangeFilter</a>
which requires a contiguous range of terms)
+	<li><a href="#DuplicateFilter">DuplicateFilter</a></li>
+	   Removes duplicated documents from results where "duplicate" means documents share a value
for a particular field (e.g. a primary key)
+	<li><a href="#BoostingQuery">BoostingQuery</a></li>
+	   Influence score of a query's matches in a subtle way which can't be achieved using BooleanQuery
+	</ul>
+	@title Contrib Lucene
+-->
+<!-- @hidden include the core DTD -->
+<!ENTITY % coreParserDTD SYSTEM "LuceneCoreQuery.dtd" >
+
+
+<!-- @hidden Allow for extensions -->
+<!ENTITY % extendedSpanQueries2 " " >
+<!ENTITY % extendedQueries2 " " >
+<!ENTITY % extendedFilters2 " " >
+
+
+<!ENTITY % extendedQueries1 "|LikeThisQuery|BoostingQuery|FuzzyLikeThisQuery%extendedQueries2;%extendedSpanQueries2;"
>
+<!ENTITY % extendedFilters1 "|TermsFilter|BooleanFilter|DuplicateFilter%extendedFilters2;"
>
+
+
+%coreParserDTD;
+
+<!--
+Performs fuzzy matching on "significant" terms in fields. Improves on "LikeThisQuery" by
allowing for fuzzy variations of supplied fields.
+Improves on FuzzyQuery by rewarding all fuzzy variants of a term with the same IDF rather
than default fuzzy behaviour which ranks rarer
+	variants (typically misspellings) more highly. This can be a useful default search mode
for processing user input where the end user
+	is not expected to know about the standard query operators for fuzzy, boolean or phrase
logic found in UserQuery
+	@example 
+	        <em>Search for information about the Sumitomo bank, where the end user has
mis-spelt the name</em>
+	        %	          
+            <FuzzyLikeThisQuery>
+                <Field fieldName="contents">
+		             Sumitimo bank
+	            </Field>
+            </FuzzyLikeThisQuery>
+	         %	
+-->
+<!ELEMENT FuzzyLikeThisQuery (Field)*>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST FuzzyLikeThisQuery boost CDATA "1.0">
+<!-- Limits the total number of terms selected from the provided text plus the selected
"fuzzy" variants -->
+<!ATTLIST FuzzyLikeThisQuery maxNumTerms CDATA "50">
+<!-- Ignore "Term Frequency" - a boost factor which rewards multiple occurences of the
same term in a document -->
+<!ATTLIST FuzzyLikeThisQuery ignoreTF (true|false) "false">
+<!-- A field used in a FuzzyLikeThisQuery -->
+<!ELEMENT Field (#PCDATA)>
+<!-- Controls the level of similarity required for fuzzy variants where 1 is identical
and 0.5 is that the variant contains 
+	half of the original's characters in the same order. Lower values produce more results but
may take longer to execute due to
+	additional IO required to read matching document ids-->
+<!ATTLIST Field minSimilarity CDATA "0.5">
+<!-- Controls the minimum number of characters at the start of fuzzy variant words that
must exactly match the original.
+	A value of zero will require no minimum and the search software will effectively scan ALL
terms from a to z looking for variations.
+	This can incur high CPU overhead and a prefix length of just "1" will reduce this overhead
to 1/26th of the original cost (assuming
+	an even distribution of letters used from the alphabet).
+ -->
+<!ATTLIST Field prefixLength CDATA "1">
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST Field fieldName CDATA #IMPLIED>
+
+
+
+<!--
+	Cherry-picks "significant" terms from the example child text and queries using these words.
By only using significant (read: rare) terms the
+	performance cost of the query is substantially reduced and large bodies of text can be used
as example content.
+	@example 
+	        <em>Use a block of text as an example of the type of content to be found,
ignoring the "Reuters" word which
+	       appears commonly in the index.</em>
+	        %
+            <LikeThisQuery percentTermsToMatch="5" stopWords="Reuters">
+                IRAQI TROOPS REPORTED PUSHING BACK IRANIANS Iraq said today its troops were
pushing Iranian forces out of 
+                positions they had initially occupied when they launched a new offensive
near the southern port of 
+                Basra early yesterday.     A High Command communique said Iraqi troops had
won a significant victory 
+                and were continuing to advance.     Iraq said it had foiled a three-pronged
thrust some 10 km 
+                (six miles) from Basra, but admitted the Iranians had occupied ground held
by the Mohammed al-Qassem 
+                unit, one of three divisions attacked.     The communique said Iranian Revolutionary
Guards were under 
+                assault from warplanes, helicopter gunships, heavy artillery and tanks. 
   "Our forces are continuing 
+                their advance until they purge the last foothold" occupied by the Iranians,
it said.     
+                (Iran said its troops had killed or wounded more than 4,000 Iraqis and were
stabilising their new positions.)     
+                The Baghdad communique said Iraqi planes also destroyed oil installations
at Iran's southwestern Ahvaz field 
+                during a raid today. It denied an Iranian report that an Iraqi jet was shot
down.     
+                Iraq also reported a naval battle at the northern tip of the Gulf. Iraqi
naval units and forces defending an 
+                offshore terminal sank six Iranian out of 28 Iranian boats attempting to
attack an offshore terminal, 
+                the communique said.      Reuters 3;
+            </LikeThisQuery>	         
+	        %	
+	-->
+<!ELEMENT LikeThisQuery (#PCDATA)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST LikeThisQuery boost CDATA "1.0">
+<!-- Comma delimited list of field names -->
+<!ATTLIST LikeThisQuery fieldNames CDATA #IMPLIED>
+<!-- a list of stop words - analyzed to produce stop terms -->
+<!ATTLIST LikeThisQuery stopWords CDATA #IMPLIED>
+<!-- controls the maximum number of words shortlisted for the query. The higher the number
the slower the response due to more disk reads required -->
+<!ATTLIST LikeThisQuery maxQueryTerms CDATA "20">
+<!-- Controls how many times a term must appear in the example text before it is shortlisted
for use in the query -->
+<!ATTLIST LikeThisQuery minTermFrequency CDATA "1">
+<!-- A quality control that can be used to limit the number of results to those documents
matching a certain percentage of the shortlisted query terms.
+	Values must be between 1 and 100-->
+<!ATTLIST LikeThisQuery percentTermsToMatch CDATA "30">
+
+<!--
+	Requires matches on the "Query" element and optionally boosts by any matches on the "BoostQuery".
+	Unlike a regular BooleanQuery the boost can be less than 1 to produce a subtractive rather
than additive result
+	on the match score. 
+	@example <em>Find documents about banks, preferably related to mergers, and preferably
not about "World bank"</em>
+    %
+	<BoostingQuery>
+      <Query>
+         <BooleanQuery fieldName="contents">
+           <Clause occurs="should">
+              <TermQuery>merger</TermQuery>
+           </Clause>
+           <Clause occurs="must">
+              <TermQuery>bank</TermQuery>
+           </Clause>
+         </BooleanQuery>	
+      </Query>
+      <BoostQuery boost="0.01">
+         <UserQuery>"world bank"</UserQuery>
+      </BoostQuery>
+    </BoostingQuery>
+	%
+	
+-->	
+<!ELEMENT BoostingQuery (Query,BoostQuery)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST BoostingQuery boost CDATA "1.0">
+
+<!--
+	Child element of BoostingQuery used to contain the choice of Query which is used for boosting
purposes
+-->	
+<!ELEMENT BoostQuery (%queries;)>
+<!-- Optional boost for matches on this query. A boost of >0 but <1 
+	effectively demotes results from Query that match this BoostQuery.  	
+	-->
+<!ATTLIST BoostQuery boost CDATA "1.0">
+
+
+
+<!-- Removes duplicated documents from results where "duplicate" means documents share
a value for a particular field such as a primary key
+	@example <em>Find the latest version of each web page that mentions "Lucene"</em>
+	%
+    <FilteredQuery>
+      <Query>
+         <TermQuery fieldName="text">lucene</TermQuery>
+      </Query>
+	  <Filter>
+		<DuplicateFilter fieldName="url" keepMode="last"/>
+	  </Filter>	
+    </FilteredQuery>	
+	%	
+	-->
+<!ELEMENT DuplicateFilter EMPTY>
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST DuplicateFilter fieldName CDATA #IMPLIED>
+<!-- Determines if the first or last document occurence is the one to return when presented
with duplicated field values -->	
+<!ATTLIST DuplicateFilter keepMode (first | last) "first">
+<!-- Controls the choice of process used to produce the filter - "full" mode identifies
only non-duplicate documents with the chosen field 
+	while "fast" mode may perform faster but will also mark documents <em>without</em>
the field as valid. The former approach starts by 
+	assuming every document is a duplicate then finds the "master" documents to keep while the
latter approach assumes all documents are 
+	unique and 	unmarks those documents that are a copy. 
+	-->	
+<!ATTLIST DuplicateFilter processingMode (full | fast) "full">
+
+
+
+
+<!-- Processes child text using a field-specific choice of Analyzer to produce a set of
terms that are then used as a filter.
+	@example <em>Find documents talking about Lucene written on a Monday or a Friday</em>
+	%
+    <FilteredQuery>
+      <Query>
+         <TermQuery fieldName="text">lucene</TermQuery>
+      </Query>
+	<Filter>
+		<TermsFilter fieldName="dayOfWeek">monday friday</TermsFilter> 
+	</Filter>	
+    </FilteredQuery>	
+	%
+	
+	-->
+<!ELEMENT TermsFilter (#PCDATA)>
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST TermsFilter fieldName CDATA #IMPLIED>
+<!--
+	A Filter equivalent to BooleanQuery that applies Boolean logic to Clauses containing Filters.
+	Unlike BooleanQuery a BooleanFilter can contain a single "mustNot" clause.
+	@example <em>Find documents from the first quarter of this year or last year that
are not in "draft" status</em>
+	%
+     <FilteredQuery>
+       <Query>
+           <MatchAllDocsQuery/>
+       </Query>
+       <Filter>
+        <BooleanFilter>
+          <Clause occurs="should">
+             <RangeFilter fieldName="date" lowerTerm="20070101" upperTerm="20070401"/>
+          </Clause>
+          <Clause occurs="should">
+             <RangeFilter fieldName="date" lowerTerm="20060101" upperTerm="20060401"/>
+          </Clause>
+          <Clause occurs="mustNot">
+             <TermsFilter fieldName="status">draft</TermsFilter> 
+          </Clause>
+        </BooleanFilter>
+       </Filter>
+    </FilteredQuery>
+	%
+	-->
+<!ELEMENT BooleanFilter (Clause)+>
+

Added: lucene/java/trunk/contrib/xml-query-parser/LuceneCoreQuery.dtd
URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/LuceneCoreQuery.dtd?rev=583307&view=auto
==============================================================================
--- lucene/java/trunk/contrib/xml-query-parser/LuceneCoreQuery.dtd (added)
+++ lucene/java/trunk/contrib/xml-query-parser/LuceneCoreQuery.dtd Tue Oct  9 14:45:27 2007
@@ -0,0 +1,397 @@
+<!--
+	<h3>Background</h3>
+	This DTD describes the XML syntax used to perform advanced searches using the core Lucene
search engine. The motivation behind the XML query syntax is:
+	<ol>
+	<li>To open up Lucene functionality to clients other than Java</li>
+	<li>To offer a form of expressing queries that can easily be
+	    <ul>
+	        <li>Persisted for logging/auditing purposes</li>
+	        <li>Changed by editing text query templates (XSLT) without requiring a recompile/redeploy
of applications</li>
+	        <li>Serialized across networks (without requiring Java bytecode for Query
logic deployed on clients)</li>
+	    </ul>
+	</li>
+	<li>To provide a shorthand way of expressing query logic which echos the logical tree
structure of query objects more closely than reading procedural Java query construction code</li>
+	<li>To bridge the growing gap between Lucene query/filtering functionality and the
set of functionality accessible throught the standard Lucene QueryParser syntax</li>
+	<li>To provide a simply extensible syntax that does not require complex parser skills
such as knowledge of JavaCC syntax</li>
+	</ol>
+	
+	
+	<h3>Syntax overview</h3>
+	Search syntax consists of two types of elements:
+	<ul>
+	<li><i>Queries</i></li>
+	<li><i>Filters</i></li>
+	</ul>
+
+	<h4>Queries</h4>
+	The root of any XML search must be a <i>Query</i> type element used to select
content.
+	Queries typically score matches on documents using a number of different factors in order
to provide relevant results first. 
+	One common example of a query tag is the <a href="#UserQuery">UserQuery</a>
element which uses the standard 
+	Lucene QueryParser to parse Google-style search syntax provided by end users.
+	
+	<h4>Filters</h4>
+	Unlike Queries, <i>Filters</i> are not used to select or score content - they
are simply used to filter <i>Query</i> output (see <a href="#FilteredQuery">FilteredQuery</a>
for an example use of query filtering).
+	Because Filters simply offer a yes/no decision for each document in the index their output
can be efficiently cached in memory as a <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html">Bitset</a>
for
+	subsequent reuse (see <a href="#CachedFilter">CachedFilter</a> tag).
+
+	<h4>Nesting elements</h4>
+	Many of the the elements can nest other elements to produce queries/filters of an arbitrary
depth and complexity. 
+	The <a href="#BooleanQuery">BooleanQuery</a> element is one such example which
provides a means for combining other queries (including other BooleanQueries) using Boolean

+	logic to determine mandatory or optional elements. 
+
+	
+	<h3>Advanced topics</h3>	
+	<h4>Advanced positional testing - span queries</h4>
+	The <i>SpanQuery</i> class of queries allow for complex positional tests which
not only look for certain combinations of words but in particular 
+	positions in relation to each other and the documents containing them.
+	
+	
+	CoreParser.java is the Java class that encapsulates this parser behaviour.
+	
+	
+	@title Core Lucene	
+-->
+
+<!-- @hidden Define core types of XML elements -->
+<!ENTITY % coreSpanQueries "SpanOr|SpanNear|SpanOrTerms|SpanFirst|SpanNot|SpanTerm" >
+<!ENTITY % coreQueries "BooleanQuery|UserQuery|FilteredQuery|TermQuery|TermsQuery|MatchAllDocsQuery|ConstantScoreQuery"
>
+<!ENTITY % coreFilters "RangeFilter|CachedFilter" >
+
+<!-- @hidden Allow for extensions -->
+<!ENTITY % extendedSpanQueries1 " " >
+<!ENTITY % extendedQueries1 " " >
+<!ENTITY % extendedFilters1 " " >
+
+<!ENTITY % spanQueries "%coreSpanQueries;%extendedSpanQueries1;" >
+<!ENTITY % queries "%coreQueries;|%spanQueries;%extendedQueries1;" >
+
+
+<!ENTITY % filters "%coreFilters;%extendedFilters1;" >
+
+<!--
+	BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted.
+	Some clauses may represent optional Query criteria while others represent mandatory criteria.

+	@example 
+	        <em>Find articles about banks, preferably talking about mergers but nothing
to do with "sumitomo"</em>
+	        %	          
+            <BooleanQuery fieldName="contents">
+	             <Clause occurs="should">
+		              <TermQuery>merger</TermQuery>
+	             </Clause>
+	             <Clause occurs="mustnot">
+		              <TermQuery>sumitomo</TermQuery>
+	             </Clause>
+	             <Clause occurs="must">
+		              <TermQuery>bank</TermQuery>
+	             </Clause>
+            </BooleanQuery>
+
+	         %
+-->	
+<!ELEMENT BooleanQuery (Clause)+>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST BooleanQuery boost CDATA "1.0">
+<!-- fieldName can optionally be defined here as a default attribute used by all child
elements -->	
+<!ATTLIST BooleanQuery fieldName CDATA #IMPLIED>
+<!-- The "Coordination factor" rewards documents that contain more of the optional clauses
in this list. This flag can be used to turn off this factor. -->
+<!ATTLIST BooleanQuery disableCoord (true | false) "false">
+<!-- The minimum number of optional clauses that should be present in any one document
before it is considered to be a match. -->
+<!ATTLIST BooleanQuery minimumNumberShouldMatch CDATA "0">
+
+<!-- NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case
only "query" types can be
+	child elements - while in a <BooleanFilter> clause only "filter" types can be contained.
+	@hidden TODO: Change BooleanFilterBuilder and BooleanQueryBuilder to auto-wrap choice of
query or filters. This type of
+	      code already exists in CachedFilter so could be reused.
+-->	
+<!ELEMENT Clause (%queries;|%filters;)>
+<!-- Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)
-->
+<!ATTLIST Clause occurs (should | must | mustnot) "should">
+
+
+<!-- Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries,
like filters, are turned into
+	Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter
is therefore numberOfDocsinIndex/8 bytes.
+	Queries that are cached as filters obviously retain none of the scoring information associated
with results - they retain just
+	a Boolean yes/no record of which documents matched. 
+	@example 
+	        <em>Search for documents about banks from the last 10 years - caching the
commonly-used "last 10 year" filter as a BitSet in 
+	RAM to eliminate the cost of building this filter from disk for every query</em>
+	        %	          
+            <FilteredQuery>
+               <Query>
+                  <UserQuery>bank</UserQuery>
+               </Query>	
+               <Filter>
+                  <CachedFilter>
+                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
+                  </CachedFilter>
+               </Filter>	
+            </FilteredQuery>
+	         %
+	
+	-->
+<!ELEMENT CachedFilter (%queries;|%filters;)>
+
+
+
+<!--
+Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"
+	@example 
+	        <em>Search for documents about John Smith or John Doe using standard LuceneQuerySyntax</em>
+	        %	          
+               <UserQuery>"John Smith" OR "John Doe"</UserQuery>
+	         %
+		
+-->
+<!ELEMENT UserQuery (#PCDATA)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST UserQuery boost CDATA "1.0">
+
+<!-- A query which is used to match all documents. This has a couple of uses: 
+	<ol>
+	<li> as a Clause in a BooleanQuery who's only other clause
+	is a "mustNot" match (Lucene requires at least one positive clause) and..</li>
+	<li> in a FilteredQuery where a Filter tag is effectively being 
+	used to select content rather than it's usual role of filtering the results of a query.</li>
+	</ol>
+	
+	@example 
+	        <em>Effectively use a Filter as a query </em>
+	        %	          
+               <FilteredQuery>
+                 <Query>
+                    <MatchAllDocsQuery/>
+                 </Query>
+                 <Filter>
+                     <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/>
+                 </Filter>	
+               </FilteredQuery>	         
+	       %
+	
+-->
+<!ELEMENT MatchAllDocsQuery EMPTY>
+
+<!-- a single term query - no analysis is done of the child text
+	@example 
+	        <em>Match on a primary key</em>
+	        %	          
+               <TermQuery fieldName="primaryKey">13424</TermQuery>
+	       %	
+-->	
+<!ELEMENT TermQuery (#PCDATA)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST TermQuery boost CDATA "1.0">
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST TermQuery fieldName CDATA #IMPLIED>
+
+
+
+<!-- 
+	The equivalent of a BooleanQuery with multiple optional TermQuery clauses.
+	Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms
that are ORed together in Boolean logic.
+	Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean
logic and as such is incapable
+	of producing a Query parse error given any user input
+	@example 
+	        <em>Match on text from a database description (which may contain characters
that 
+	are illegal characters in the standard Lucene Query syntax used in the UserQuery tag</em>
+	        %	          
+               <TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated
1982</TermsQuery>
+	       %	
+-->	
+<!ELEMENT TermsQuery (#PCDATA)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST TermsQuery boost CDATA "1.0">
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST TermsQuery fieldName CDATA #IMPLIED>
+<!-- The "Coordination factor" rewards documents that contain more of the terms in this
list. This flag can be used to turn off this factor. -->
+<!ATTLIST TermsQuery disableCoord (true | false) "false">
+<!-- The minimum number of terms that should be present in any one document before it
is considered to be a match. -->
+<!ATTLIST TermsQuery minimumNumberShouldMatch CDATA "0">
+
+
+<!-- 
+	Runs a Query and filters results to only those query matches that also match the Filter
element.	
+	@example 
+	        <em>Find all documents about Lucene that have a status of "published"</em>
+	        %	          
+               <FilteredQuery>
+                 <Query>
+                    <UserQuery>Lucene</UserQuery>
+                 </Query>
+                 <Filter>
+                     <TermsFilter fieldName="status">published</TermsFilter>
+                 </Filter>	
+               </FilteredQuery>	         
+	       %	
+-->	
+<!ELEMENT FilteredQuery (Query,Filter)>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST FilteredQuery boost CDATA "1.0">
+<!-- Used to identify a nested Query element inside another container element. NOT a top-level
query tag  -->
+<!ELEMENT Query (%queries;)>
+<!-- The choice of Filter that MUST also be matched  -->
+<!ELEMENT Filter (%filters;)>
+
+<!--
+	Filter used to limit query results to documents matching a range of field values
+	@example 
+	        <em>Search for documents about banks from the last 10 years</em>
+	        %	          
+            <FilteredQuery>
+               <Query>
+                  <UserQuery>bank</UserQuery>
+               </Query>	
+               <Filter>
+                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
+               </Filter>	
+            </FilteredQuery>
+	         %
+	-->
+<!ELEMENT RangeFilter EMPTY>
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST RangeFilter fieldName CDATA #IMPLIED>
+<!-- The lower-most term value for this field (must be <= upperTerm) -->
+<!ATTLIST RangeFilter lowerTerm CDATA #REQUIRED>
+<!-- The upper-most term value for this field (must be >= lowerTerm) -->
+<!ATTLIST RangeFilter upperTerm CDATA #REQUIRED>
+<!-- Controls if the lowerTerm in the range is part of the allowed set of values -->
+<!ATTLIST RangeFilter includeLower (true | false) "true">
+<!-- Controls if the upperTerm in the range is part of the allowed set of values -->
+<!ATTLIST RangeFilter includeUpper (true | false) "true">
+
+
+
+<!-- A single term used in a SpanQuery. These clauses are the building blocks for more
complex "span" queries which test word proximity
+	@example <em>Find documents using terms close to each other about mining and accidents</em>
+	      %
+	      <SpanNear slop="8" inOrder="false" fieldName="text">		
+			<SpanOr>
+				<SpanTerm>killed</SpanTerm>
+				<SpanTerm>died</SpanTerm>
+				<SpanTerm>dead</SpanTerm>
+			</SpanOr>
+			<SpanOr>
+				<SpanTerm>miner</SpanTerm>
+				<SpanTerm>mining</SpanTerm>
+				<SpanTerm>miners</SpanTerm>
+			</SpanOr>
+	      </SpanNear>
+	      % 	
+	-->
+<!ELEMENT SpanTerm (#PCDATA)>
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST SpanTerm fieldName CDATA #REQUIRED>
+
+<!-- A field-specific analyzer is used here to parse the child text provided in this tag.
The SpanTerms produced are ORed in terms of Boolean logic 
+	@example <em>Use SpanOrTerms as a more convenient/succinct way of expressing multiple
choices of SpanTerms. This example looks for reports 
+	using words describing a fatality near to references to miners</em>
+	      %
+	      <SpanNear slop="8" inOrder="false" fieldName="text">		
+			<SpanOrTerms>killed died death dead deaths</SpanOrTerms>
+			<SpanOrTerms>miner mining miners</SpanOrTerms>
+	      </SpanNear>
+	      % 	
+	-->
+<!ELEMENT SpanOrTerms (#PCDATA)>
+<!-- fieldName must be defined here or is taken from the most immediate parent XML element
that defines a "fieldName" attribute -->	
+<!ATTLIST SpanOrTerms fieldName CDATA #REQUIRED>
+
+<!-- Takes any number of child queries from the Span family 
+	@example <em>Find documents using terms close to each other about mining and accidents</em>
+	      %
+	      <SpanNear slop="8" inOrder="false" fieldName="text">		
+			<SpanOr>
+				<SpanTerm>killed</SpanTerm>
+				<SpanTerm>died</SpanTerm>
+				<SpanTerm>dead</SpanTerm>
+			</SpanOr>
+			<SpanOr>
+				<SpanTerm>miner</SpanTerm>
+				<SpanTerm>mining</SpanTerm>
+				<SpanTerm>miners</SpanTerm>
+			</SpanOr>
+	      </SpanNear>
+	      %	
+	
+	-->
+<!ELEMENT SpanOr (%spanQueries;)* >
+
+<!-- Takes any number of child queries from the Span family and tests for proximity
+	@hidden TODO SpanNear missing "boost attr (could add to SpanBuilderBase)
+	-->
+<!ELEMENT SpanNear (%spanQueries;)* >
+<!-- defines the maximum distance between Span elements where distance is expressed as
word number, not byte offset 
+	@example <em>Find documents using terms within 8 words of each other talking about
mining and accidents</em>
+	      %
+	      <SpanNear slop="8" inOrder="false" fieldName="text">		
+			<SpanOr>
+				<SpanTerm>killed</SpanTerm>
+				<SpanTerm>died</SpanTerm>
+				<SpanTerm>dead</SpanTerm>
+			</SpanOr>
+			<SpanOr>
+				<SpanTerm>miner</SpanTerm>
+				<SpanTerm>mining</SpanTerm>
+				<SpanTerm>miners</SpanTerm>
+			</SpanOr>
+	      </SpanNear>
+	      %	
+	-->
+<!ATTLIST SpanNear slop CDATA #REQUIRED>
+<!-- Controls if matching terms  have to appear in the order listed or can be reversed
-->
+<!ATTLIST SpanNear inOrder (true | false) "true">
+
+<!-- Looks for a SpanQuery match occuring near the beginning of a document
+	
+	@example 
+	        <em>Find letters where the first 50 words talk about a resignation:</em>
+	        %	          
+	         <SpanFirst end="50">
+	               <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms>
+	         </SpanFirst>
+	         %
+	
+	 --> 
+<!ELEMENT SpanFirst (%spanQueries;) >
+<!-- Controls the end of the region considered in a document's field (expressed in word
number, not byte offset) --> 
+<!ATTLIST SpanFirst end CDATA #REQUIRED>
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST SpanFirst boost CDATA "1.0">
+
+<!-- Finds documents matching a SpanQuery but not if matching another SpanQuery 
+	@example <em>Find documents talking about social services but not containing the word
"public"</em>
+	      %
+          <SpanNot fieldName="text">
+             <Include>
+                <SpanNear slop="2" inOrder="true">		
+                     <SpanTerm>social</SpanTerm>
+                     <SpanTerm>services</SpanTerm>
+                </SpanNear>				
+             </Include>
+             <Exclude>
+                <SpanTerm>public</SpanTerm>
+             </Exclude>
+          </SpanNot>
+	      %	
+	
+	-->
+<!ELEMENT SpanNot (Include,Exclude) >
+<!-- The SpanQuery to find -->
+<!ELEMENT Include (%spanQueries;) >
+<!-- The SpanQuery to be avoided -->
+<!ELEMENT Exclude (%spanQueries;) >
+
+
+<!-- a utility tag to wrap any filter as a query 
+	@example <em> Find all documents from the last 10 years </em>
+	%
+     <ConstantScoreQuery>
+           <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
+     </ConstantScoreQuery>	
+	%
+	-->
+<!ELEMENT ConstantScoreQuery (%filters;)* >
+<!-- Optional boost for matches on this query. Values > 1 -->
+<!ATTLIST ConstantScoreQuery boost CDATA "1.0">
+
+
+

Added: lucene/java/trunk/contrib/xml-query-parser/docs/DTDDocStyle.css
URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/docs/DTDDocStyle.css?rev=583307&view=auto
==============================================================================
--- lucene/java/trunk/contrib/xml-query-parser/docs/DTDDocStyle.css (added)
+++ lucene/java/trunk/contrib/xml-query-parser/docs/DTDDocStyle.css Tue Oct  9 14:45:27 2007
@@ -0,0 +1,57 @@
+body { background-color:white; font-family: Verdana, Geneva, Arial, Helvetica, sans-serif;
font-size: small; }
+table { border: none}
+th.title { font-weight:bold; text-align:center; padding-left:1em; padding-right:1em}
+th.subtitle { font-weight:normal; text-align:left; padding-left:1em; padding-right:1em}
+td { text-align:left; vertical-align:baseline; padding-left:1em; padding-right:1em}
+td.construct { vertical-align:top; padding-left:1em; padding-right:1em}
+th.ruler { background-color:black; color:black}
+table.elementTitle { width:100%; background-color:#E0F0FF}
+td.leftElementTitle { font-weight:bold; font-size: medium;}
+td.rightElementTitle { text-align:right}
+table.attributeTitle { width:100%; background-color:#DDDDDD}
+td.leftAttributeTitle { font-weight:bold; font-size: medium;}
+td.rightAttributeTitle { text-align:right}
+p { text-align:justify;}
+p.model { padding-left:3em; text-align:left; font-style:italic}
+p.DTDSource { text-align:right; width:100%; background-color:#DDDDDD}
+p.emptyTagNote { font-style:italic}
+h1 { font-family:Arial; font-size: x-large;}
+h2 { font-size: medium;}
+h2.TOCTitle { font-family:Arial; font-size: medium;}
+.inTextTitle { font-weight:bold }
+pre, code { font-family: "Courier New", Courier, monospace; font-size: small; }
+
+pre#dtd_source { border: dotted 1px Gray; padding: 6pt 6pt 6pt 6pt;}
+.xml_plain {
+color: #000;
+}
+.dtd_tag_symbols {
+color: #003bff;
+}
+.dtd_comment {
+color: #555; background-color: #f8f8f8;
+}
+.dtd_attribute_name {
+color: #000;
+}
+.dtd_tag_name {
+color: #3f3fbf;
+}
+.dtd_char_data {
+color: #000;
+}
+.dtd_processing_instruction {
+color: #000; font-weight: bold; font-style: italic;
+}
+.dtd_attribute_value {
+color: #c10000;
+}
+.dtd_dtddoc_tag {
+color: #939393;
+background-color: #f7f7f7;
+font-style: italic;
+font-weight: bold;
+}
+.dtd_keyword {
+color: #800000;
+}

Added: lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.entities.html
URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.entities.html?rev=583307&view=auto
==============================================================================
--- lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.entities.html (added)
+++ lucene/java/trunk/contrib/xml-query-parser/docs/LuceneContribQuery.dtd.entities.html Tue
Oct  9 14:45:27 2007
@@ -0,0 +1,91 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html><head><title>LuceneContribQuery.dtd's entities</title>
+<meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
+<link rel='StyleSheet' href='DTDDocStyle.css' type='text/css' media='screen' />
+</head><body>
+<p class='DTDSource'><b><code>LuceneContribQuery.dtd</code></b>:
<a href='LuceneContribQuery.dtd.html'>Elements</a> - <a href='LuceneContribQuery.dtd.entities.html'>Entities</a>
- <a href='LuceneContribQuery.dtd.org.html'>Source</a> | <a href='intro.html'>Intro</a>
- <a href='elementsIndex.html'>Index</a><br /><a href='index.html' target='_top'>FRAMES</a>&nbsp;/&nbsp;<a
href='LuceneContribQuery.dtd.entities.html' target='_top'>NO FRAMES</a></p><h1>Entities
for Contrib Lucene</h1>
+<table summary='Entities'>
+<thead><tr><th>Name</th><th>Value</th></tr></thead>
+<tbody>
+<tr><th colspan='2' height='1' class='ruler'></th></tr>
+<tr>
+<td>coreParserDTD</td>
+<td>
+LuceneCoreQuery.dtd <i>(system)</i>
+</td>
+</tr>
+<tr>
+<td>filters</td>
+<td>
+%coreFilters;%extendedFilters1;
+</td>
+</tr>
+<tr>
+<td>spanQueries</td>
+<td>
+%coreSpanQueries;%extendedSpanQueries1;
+</td>
+</tr>
+<tr>
+<td>extendedSpanQueries2</td>
+<td>
+ 
+</td>
+</tr>
+<tr>
+<td>extendedSpanQueries1</td>
+<td>
+ 
+</td>
+</tr>
+<tr>
+<td>queries</td>
+<td>
+%coreQueries;|%spanQueries;%extendedQueries1;
+</td>
+</tr>
+<tr>
+<td>extendedQueries2</td>
+<td>
+ 
+</td>
+</tr>
+<tr>
+<td>extendedQueries1</td>
+<td>
+|LikeThisQuery|BoostingQuery|FuzzyLikeThisQuery%extendedQueries2;%extendedSpanQueries2;
+</td>
+</tr>
+<tr>
+<td>coreSpanQueries</td>
+<td>
+SpanOr|SpanNear|SpanOrTerms|SpanFirst|SpanNot|SpanTerm
+</td>
+</tr>
+<tr>
+<td>coreFilters</td>
+<td>
+RangeFilter|CachedFilter
+</td>
+</tr>
+<tr>
+<td>extendedFilters2</td>
+<td>
+ 
+</td>
+</tr>
+<tr>
+<td>extendedFilters1</td>
+<td>
+|TermsFilter|BooleanFilter|DuplicateFilter%extendedFilters2;
+</td>
+</tr>
+<tr>
+<td>coreQueries</td>
+<td>
+BooleanQuery|UserQuery|FilteredQuery|TermQuery|TermsQuery|MatchAllDocsQuery|ConstantScoreQuery
+</td>
+</tr>
+</tbody>
+</table>
+</body></html>



Mime
View raw message