lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: PLEASE REVIEW: QueryParser syntax documentation
Date Thu, 09 May 2002 15:30:42 GMT
Thanks, this is great to have!

A few things you don't mention:

 - Grouping: parentheses may be used to group clauses, for example:
     apple AND (fruit OR tree)

 - Plus and minus: these can be used to require or prohibit clauses:
     +apple -"computer company" -model:(macintosh OR lisa)


> -----Original Message-----
> From: Peter Carlson
> []
> Sent: Wednesday, May 08, 2002 9:55 PM
> To:
> Subject: PLEASE REVIEW: QueryParser syntax documentation
> Hi,
> I was trying to validate how the unit test should work for 
> wildcard searches
> and I couldn't find a central reference for the query 
> language. Here is a
> general reference that I thought might be useful for people trying to
> understand all the QueryParser language (it's based on some 
> instruction I
> wrote for a project so I hope it makes sense).
> Please provide comments, then I'll post it.
> Thanks
> --Peter
> Overview
> Although Lucene provides the ability to create your own 
> query's though its
> API, it also provides a rich query language through the QueryParser.
> Terms
> A query is broken up into terms and operators. There are two 
> types of terms:
> Single Terms and Phrases.
> A Single Term is a single word such as "test" or "oracle".
> A Phrase is a group of words surrounded by double quotes such as "test
> oracle".
> Each of these terms can be combined together with Boolean 
> operators to form
> a more complex query (see below).
> Fields
> Lucene supports fielded data. When performing a search you can either
> specify a field, or use the default field. The fields and 
> default field is
> implementation specific.
> You can search any of these fields by typing the field name 
> followed by a
> colon ":" and then the term you are looking for. For example, 
> if a Lucene
> index contains two fields, title and text and text is the 
> default field. If
> you want to find the document entitled "The Right Way" which 
> contains the
> text "right", you can enter:
> title:"The Right Way" AND text:right
> or 
> title:"Do it right" AND right
> If text is the default field
> Note: The field is only valid for the term that it directly 
> precedes, so the
> query
> title:Do it right
> Will only find "Do" in the title field. It will find "it" and 
> "right" in the
> default field (in this case the text field).
> Wildcard Searches
> Lucene supports single and multiple character wildcard searches.
> To perform a single character wildcard search use the "?" symbol.
> To perform a multiple character wildcard search use the "*" symbol.
> The single character wildcard search looks for terms that 
> match that with
> the single character replaced. For example, to search for 
> "text" or "test"
> you can use the search:
> te?t
> Note: searching for "test?" will not find "test", but will 
> find "tests".
> Multiple character wildcard searches looks for 0 or more 
> characters. For
> example, to search for test, tests or tester, you can use the search:
> test*
> You can also use the wildcard searches in the middle of a term.
> te*t
> Note: You cannot use a * or ? symbol as the first character 
> of a search.
> Fuzzy Searches
> Lucene supports fuzzy searches based on the Levenshtein 
> Distance, or Edit
> Distance algorithm. To do a fuzzy search use the tilde, "~", 
> symbol at the
> end of a term. For example to search for a term similar in spelling to
> "roam" use the fuzzy search:
> roam~
> This search will find terms like foam and roams
> Boosting a Term
> Lucene provides the relevance level of matching documents 
> based on the terms
> found. To boost a term use the caret, "^", symbol with a 
> boost factor (a
> number) at the end of the term you are searching. The higher the boost
> factor, the more relevant the term will be.
> Boosting allows you to control the relevance of a document by 
> boosting its
> term. For example, to search for
> IBM Microsoft
> and you want the term "IBM" to be more relevant boost it 
> using the ^ symbol
> along with the boost factor next to the term. You would type:
> IBM^4 Microsoft
> This will make documents with the term IBM appear more 
> relevant. You can
> also boost Phrase Terms as in the example:
> "Microsoft Word"^4 "Microsoft Excel"
> By default, the boost factor is 1.
> Boolean operators
> Lucene supports AND, OR and NOT as Boolean operators.(Note: Boolean
> operators must be ALL CAPS).
> OR
> The OR operator is the default conjunction operator. This 
> means that if
> there is no Boolean operator between two terms, the OR 
> operator is used. The
> OR operator links two terms and finds a matching document if 
> either of the
> terms exist in a document. For example to search for 
> documents that contain
> either "Microsoft Word" or just "Microsoft":
> "Microsoft Word" Microsoft
> or 
> "Microsoft Word" OR Microsoft
> The AND operator matches documents where both terms exist 
> anywhere in the
> text of a single document. For example to search for 
> documents that contain
> "Microsoft Word" and "Microsoft Excel":
> "Microsoft Word" AND "Microsoft Excel"
> The NOT operator excludes documents that contain the term 
> after NOT. For
> example to search for documents that contain "Microsoft Word" but not
> "Microsoft Excel": 
> "Microsoft Word" NOT "Microsoft Excel"
> --
> To unsubscribe, e-mail:   
> <>
> For additional commands, e-mail: 
> <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message