Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 97516 invoked from network); 9 May 2002 15:03:44 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 9 May 2002 15:03:44 -0000 Received: (qmail 20571 invoked by uid 97); 9 May 2002 15:03:45 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 20451 invoked by uid 97); 9 May 2002 15:03:44 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 20427 invoked by uid 98); 9 May 2002 15:03:43 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Message-ID: <20020509150341.39843.qmail@web12706.mail.yahoo.com> Date: Thu, 9 May 2002 08:03:41 -0700 (PDT) From: Otis Gospodnetic Subject: Re: PLEASE REVIEW: QueryParser syntax documentation To: Lucene Developers List In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Looks good to me. Two minor things. This sentence doesn't make sense to me: For example, if a Lucene index contains two fields, title and text and text is the default field. Also, maybe you can mention that one can use &, |, etc. in place of AND, OR, etc. Thanks, Otis P.S. How about grouping? Does Lucene's query parser support that? For instance: (("red snapper" AND Sancere) OR (burger AND Pepsi)) --- Peter Carlson wrote: > Hi, > > I was trying to validate how the unit test should work for wildcard > searches > and I couldn't find a central reference for the query language. Here > is a > general reference that I thought might be useful for people trying to > understand all the QueryParser language (it's based on some > instruction I > wrote for a project so I hope it makes sense). > > Please provide comments, then I'll post it. > > Thanks > > --Peter > > Overview > Although Lucene provides the ability to create your own query's > though its > API, it also provides a rich query language through the QueryParser. > > Terms > A query is broken up into terms and operators. There are two types of > terms: > Single Terms and Phrases. > A Single Term is a single word such as "test" or "oracle". > A Phrase is a group of words surrounded by double quotes such as > "test > oracle". > Each of these terms can be combined together with Boolean operators > to form > a more complex query (see below). > > > Fields > Lucene supports fielded data. When performing a search you can either > specify a field, or use the default field. The fields and default > field is > implementation specific. > > You can search any of these fields by typing the field name followed > by a > colon ":" and then the term you are looking for. For example, if a > Lucene > index contains two fields, title and text and text is the default > field. If > you want to find the document entitled "The Right Way" which contains > the > text "right", you can enter: > > title:"The Right Way" AND text:right > or > title:"Do it right" AND right > If text is the default field > > Note: The field is only valid for the term that it directly precedes, > so the > query > title:Do it right > Will only find "Do" in the title field. It will find "it" and "right" > in the > default field (in this case the text field). > > Wildcard Searches > Lucene supports single and multiple character wildcard searches. > To perform a single character wildcard search use the "?" symbol. > To perform a multiple character wildcard search use the "*" symbol. > The single character wildcard search looks for terms that match that > with > the single character replaced. For example, to search for "text" or > "test" > you can use the search: > > te?t > Note: searching for "test?" will not find "test", but will find > "tests". > > Multiple character wildcard searches looks for 0 or more characters. > For > example, to search for test, tests or tester, you can use the search: > > test* > You can also use the wildcard searches in the middle of a term. > > te*t > Note: You cannot use a * or ? symbol as the first character of a > search. > > Fuzzy Searches > Lucene supports fuzzy searches based on the Levenshtein Distance, or > Edit > Distance algorithm. To do a fuzzy search use the tilde, "~", symbol > at the > end of a term. For example to search for a term similar in spelling > to > "roam" use the fuzzy search: > > roam~ > This search will find terms like foam and roams > > Boosting a Term > Lucene provides the relevance level of matching documents based on > the terms > found. To boost a term use the caret, "^", symbol with a boost factor > (a > number) at the end of the term you are searching. The higher the > boost > factor, the more relevant the term will be. > Boosting allows you to control the relevance of a document by > boosting its > term. For example, to search for > > IBM Microsoft > and you want the term "IBM" to be more relevant boost it using the ^ > symbol > along with the boost factor next to the term. You would type: > > IBM^4 Microsoft > This will make documents with the term IBM appear more relevant. You > can > also boost Phrase Terms as in the example: > > "Microsoft Word"^4 "Microsoft Excel" > By default, the boost factor is 1. > > Boolean operators > Lucene supports AND, OR and NOT as Boolean operators.(Note: Boolean > operators must be ALL CAPS). > > OR > The OR operator is the default conjunction operator. This means that > if > there is no Boolean operator between two terms, the OR operator is > used. The > OR operator links two terms and finds a matching document if either > of the > terms exist in a document. For example to search for documents that > contain > either "Microsoft Word" or just "Microsoft": > > "Microsoft Word" Microsoft > > or > > "Microsoft Word" OR Microsoft > > > AND > The AND operator matches documents where both terms exist anywhere in > the > text of a single document. For example to search for documents that > contain > "Microsoft Word" and "Microsoft Excel": > > "Microsoft Word" AND "Microsoft Excel" > > NOT > The NOT operator excludes documents that contain the term after NOT. > For > example to search for documents that contain "Microsoft Word" but not > "Microsoft Excel": > > "Microsoft Word" NOT "Microsoft Excel" > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Mother's Day is May 12th! http://shopping.yahoo.com -- To unsubscribe, e-mail: For additional commands, e-mail: