lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: PLEASE REVIEW: QueryParser syntax documentation
Date Thu, 09 May 2002 15:03:41 GMT
Looks good to me.
Two minor things.  This sentence doesn't make sense to me:

For example, if a Lucene index contains two fields, title and text and
text is the default field.

Also, maybe you can mention that one can use &, |, etc. in place of
AND, OR, etc.

How about grouping?  Does Lucene's query parser support that?
For instance: (("red snapper" AND Sancere) OR (burger AND Pepsi))

--- Peter Carlson <> wrote:
> Hi,
> I was trying to validate how the unit test should work for wildcard
> searches
> and I couldn't find a central reference for the query language. Here
> is a
> general reference that I thought might be useful for people trying to
> understand all the QueryParser language (it's based on some
> instruction I
> wrote for a project so I hope it makes sense).
> Please provide comments, then I'll post it.
> Thanks
> --Peter
> Overview
> Although Lucene provides the ability to create your own query's
> though its
> API, it also provides a rich query language through the QueryParser.
> Terms
> A query is broken up into terms and operators. There are two types of
> terms:
> Single Terms and Phrases.
> A Single Term is a single word such as "test" or "oracle".
> A Phrase is a group of words surrounded by double quotes such as
> "test
> oracle".
> Each of these terms can be combined together with Boolean operators
> to form
> a more complex query (see below).
> Fields
> Lucene supports fielded data. When performing a search you can either
> specify a field, or use the default field. The fields and default
> field is
> implementation specific.
> You can search any of these fields by typing the field name followed
> by a
> colon ":" and then the term you are looking for. For example, if a
> Lucene
> index contains two fields, title and text and text is the default
> field. If
> you want to find the document entitled "The Right Way" which contains
> the
> text "right", you can enter:
> title:"The Right Way" AND text:right
> or 
> title:"Do it right" AND right
> If text is the default field
> Note: The field is only valid for the term that it directly precedes,
> so the
> query
> title:Do it right
> Will only find "Do" in the title field. It will find "it" and "right"
> in the
> default field (in this case the text field).
> Wildcard Searches
> Lucene supports single and multiple character wildcard searches.
> To perform a single character wildcard search use the "?" symbol.
> To perform a multiple character wildcard search use the "*" symbol.
> The single character wildcard search looks for terms that match that
> with
> the single character replaced. For example, to search for "text" or
> "test"
> you can use the search:
> te?t
> Note: searching for "test?" will not find "test", but will find
> "tests".
> Multiple character wildcard searches looks for 0 or more characters.
> For
> example, to search for test, tests or tester, you can use the search:
> test*
> You can also use the wildcard searches in the middle of a term.
> te*t
> Note: You cannot use a * or ? symbol as the first character of a
> search.
> Fuzzy Searches
> Lucene supports fuzzy searches based on the Levenshtein Distance, or
> Edit
> Distance algorithm. To do a fuzzy search use the tilde, "~", symbol
> at the
> end of a term. For example to search for a term similar in spelling
> to
> "roam" use the fuzzy search:
> roam~
> This search will find terms like foam and roams
> Boosting a Term
> Lucene provides the relevance level of matching documents based on
> the terms
> found. To boost a term use the caret, "^", symbol with a boost factor
> (a
> number) at the end of the term you are searching. The higher the
> boost
> factor, the more relevant the term will be.
> Boosting allows you to control the relevance of a document by
> boosting its
> term. For example, to search for
> IBM Microsoft
> and you want the term "IBM" to be more relevant boost it using the ^
> symbol
> along with the boost factor next to the term. You would type:
> IBM^4 Microsoft
> This will make documents with the term IBM appear more relevant. You
> can
> also boost Phrase Terms as in the example:
> "Microsoft Word"^4 "Microsoft Excel"
> By default, the boost factor is 1.
> Boolean operators
> Lucene supports AND, OR and NOT as Boolean operators.(Note: Boolean
> operators must be ALL CAPS).
> OR
> The OR operator is the default conjunction operator. This means that
> if
> there is no Boolean operator between two terms, the OR operator is
> used. The
> OR operator links two terms and finds a matching document if either
> of the
> terms exist in a document. For example to search for documents that
> contain
> either "Microsoft Word" or just "Microsoft":
> "Microsoft Word" Microsoft
> or 
> "Microsoft Word" OR Microsoft
> The AND operator matches documents where both terms exist anywhere in
> the
> text of a single document. For example to search for documents that
> contain
> "Microsoft Word" and "Microsoft Excel":
> "Microsoft Word" AND "Microsoft Excel"
> The NOT operator excludes documents that contain the term after NOT.
> For
> example to search for documents that contain "Microsoft Word" but not
> "Microsoft Excel": 
> "Microsoft Word" NOT "Microsoft Excel"
> --
> To unsubscribe, e-mail:  
> <>
> For additional commands, e-mail:
> <>

Do You Yahoo!?
Yahoo! Shopping - Mother's Day is May 12th!

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message