lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cutt...@lucene.com
Subject RE: PLEASE REVIEW: QueryParser syntax documentation
Date Thu, 09 May 2002 15:30:42 GMT
Thanks, this is great to have!

A few things you don't mention:

 - Grouping: parentheses may be used to group clauses, for example:
     apple AND (fruit OR tree)

 - Plus and minus: these can be used to require or prohibit clauses:
     +apple -"computer company" -model:(macintosh OR lisa)

Doug

> -----Original Message-----
> From: Peter Carlson
> [mailto:carlson.at.bookandhammer.com@cutting.at.lucene.com]
> Sent: Wednesday, May 08, 2002 9:55 PM
> To: dcutting@grandcentral.com
> Subject: PLEASE REVIEW: QueryParser syntax documentation
> 
> 
> Hi,
> 
> I was trying to validate how the unit test should work for 
> wildcard searches
> and I couldn't find a central reference for the query 
> language. Here is a
> general reference that I thought might be useful for people trying to
> understand all the QueryParser language (it's based on some 
> instruction I
> wrote for a project so I hope it makes sense).
> 
> Please provide comments, then I'll post it.
> 
> Thanks
> 
> --Peter
> 
> Overview
> Although Lucene provides the ability to create your own 
> query's though its
> API, it also provides a rich query language through the QueryParser.
> 
> Terms
> A query is broken up into terms and operators. There are two 
> types of terms:
> Single Terms and Phrases.
> A Single Term is a single word such as "test" or "oracle".
> A Phrase is a group of words surrounded by double quotes such as "test
> oracle".
> Each of these terms can be combined together with Boolean 
> operators to form
> a more complex query (see below).
> 
> 
> Fields
> Lucene supports fielded data. When performing a search you can either
> specify a field, or use the default field. The fields and 
> default field is
> implementation specific.
> 
> You can search any of these fields by typing the field name 
> followed by a
> colon ":" and then the term you are looking for. For example, 
> if a Lucene
> index contains two fields, title and text and text is the 
> default field. If
> you want to find the document entitled "The Right Way" which 
> contains the
> text "right", you can enter:
> 
> title:"The Right Way" AND text:right
> or 
> title:"Do it right" AND right
> If text is the default field
> 
> Note: The field is only valid for the term that it directly 
> precedes, so the
> query
> title:Do it right
> Will only find "Do" in the title field. It will find "it" and 
> "right" in the
> default field (in this case the text field).
> 
> Wildcard Searches
> Lucene supports single and multiple character wildcard searches.
> To perform a single character wildcard search use the "?" symbol.
> To perform a multiple character wildcard search use the "*" symbol.
> The single character wildcard search looks for terms that 
> match that with
> the single character replaced. For example, to search for 
> "text" or "test"
> you can use the search:
> 
> te?t
> Note: searching for "test?" will not find "test", but will 
> find "tests".
> 
> Multiple character wildcard searches looks for 0 or more 
> characters. For
> example, to search for test, tests or tester, you can use the search:
> 
> test*
> You can also use the wildcard searches in the middle of a term.
> 
> te*t
> Note: You cannot use a * or ? symbol as the first character 
> of a search.
> 
> Fuzzy Searches
> Lucene supports fuzzy searches based on the Levenshtein 
> Distance, or Edit
> Distance algorithm. To do a fuzzy search use the tilde, "~", 
> symbol at the
> end of a term. For example to search for a term similar in spelling to
> "roam" use the fuzzy search:
> 
> roam~
> This search will find terms like foam and roams
> 
> Boosting a Term
> Lucene provides the relevance level of matching documents 
> based on the terms
> found. To boost a term use the caret, "^", symbol with a 
> boost factor (a
> number) at the end of the term you are searching. The higher the boost
> factor, the more relevant the term will be.
> Boosting allows you to control the relevance of a document by 
> boosting its
> term. For example, to search for
> 
> IBM Microsoft
> and you want the term "IBM" to be more relevant boost it 
> using the ^ symbol
> along with the boost factor next to the term. You would type:
> 
> IBM^4 Microsoft
> This will make documents with the term IBM appear more 
> relevant. You can
> also boost Phrase Terms as in the example:
> 
> "Microsoft Word"^4 "Microsoft Excel"
> By default, the boost factor is 1.
> 
> Boolean operators
> Lucene supports AND, OR and NOT as Boolean operators.(Note: Boolean
> operators must be ALL CAPS).
> 
> OR
> The OR operator is the default conjunction operator. This 
> means that if
> there is no Boolean operator between two terms, the OR 
> operator is used. The
> OR operator links two terms and finds a matching document if 
> either of the
> terms exist in a document. For example to search for 
> documents that contain
> either "Microsoft Word" or just "Microsoft":
> 
> "Microsoft Word" Microsoft
> 
> or 
> 
> "Microsoft Word" OR Microsoft
> 
> 
> AND
> The AND operator matches documents where both terms exist 
> anywhere in the
> text of a single document. For example to search for 
> documents that contain
> "Microsoft Word" and "Microsoft Excel":
> 
> "Microsoft Word" AND "Microsoft Excel"
> 
> NOT
> The NOT operator excludes documents that contain the term 
> after NOT. For
> example to search for documents that contain "Microsoft Word" but not
> "Microsoft Excel": 
> 
> "Microsoft Word" NOT "Microsoft Excel"
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-dev-help@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message