Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 440 invoked from network); 3 Jun 2008 05:31:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 05:31:19 -0000 Received: (qmail 49846 invoked by uid 500); 3 Jun 2008 05:31:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49812 invoked by uid 500); 3 Jun 2008 05:31:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49788 invoked by uid 99); 3 Jun 2008 05:31:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jun 2008 22:31:13 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sonusr@gmail.com designates 209.85.198.229 as permitted sender) Received: from [209.85.198.229] (HELO rv-out-0506.google.com) (209.85.198.229) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 05:30:24 +0000 Received: by rv-out-0506.google.com with SMTP id f6so1472420rvb.5 for ; Mon, 02 Jun 2008 22:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=59p/rMz6qsmHtGjv56GLIXw27oeSk8e7PQWI0ipi+dw=; b=qs9Exhg7fdzUkK5pfFyLEtz5N4dlQwPMaJDpegbvgwB1hWFuSLdGFIDBtVpAB3+FOuR9kEvr8R33AGYnk8b4lKw//JuYe1KhM85soVY/S7H9ra+WG6L+Y1RSNrbtpiwTtzPEmb4fXab/k95bPnqCet1pgrhreJfCINPOMQIUsO4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=UJlp5DVx1/7RanR8Ok/la3mJrLmzBYqhYUNp7bK6pnqXWjDjjCcX4JxBQqQd0yZeLxlxmK8YuDMtML07Ln5vVI9XF5/j9t+USTpiZ6rCTgEanoc+t4GWAT/TPh9wju1FnaVXReQC2+ojGYuW29EGYf2IWOUxxX0WdTGbnHBc3mE= Received: by 10.140.191.14 with SMTP id o14mr5504622rvf.130.1212471041630; Mon, 02 Jun 2008 22:30:41 -0700 (PDT) Received: by 10.141.179.14 with HTTP; Mon, 2 Jun 2008 22:30:41 -0700 (PDT) Message-ID: Date: Tue, 3 Jun 2008 11:00:41 +0530 From: "Sonu Sudhakar" To: java-user@lucene.apache.org Subject: Re: Boolean Query Issue In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_14222_20795998.1212471041604" References: <359a92830805280639t947f608o6bf280d131e8b7a9@mail.gmail.com> <359a92830806020549v9a82a98sbe30dd1a2e8d5154@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_14222_20795998.1212471041604 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Sorry, The first mail I forgot to add the line for setting default QueryParser operator. I have Included that too. > This is the code I am using for search. > > public void doSearch(String userQuery, int stemFlag, int sortFlag, String[] > sortFileds) throws Exception { > > PerFieldAnalyzerWrapper analyzer; > if (stemFlag == 1) { > analyzer = new PerFieldAnalyzerWrapper(new > StemmingAnalyzer()); > } else { > analyzer = new PerFieldAnalyzerWrapper(new > StandardAnalyzer()); > } > > //Stemming is not applied on country and state fields > analyzer.addAnalyzer("IS", new StandardAnalyzer()); > analyzer.addAnalyzer("AS", new StandardAnalyzer()); > analyzer.addAnalyzer("IC", new StandardAnalyzer()); > analyzer.addAnalyzer("AC", new StandardAnalyzer()); > > this.userQuery = userQuery; > > // Looking for remote indexes > RemoteLookup rLookup = new RemoteLookup(); > > // this will create Searchables object array > Searchable[] srbl = rLookup.getSearchables(); > > pmsearcher = new ParallelMultiSearcher(srbl); > > QueryParser qp = null; > qp = new QueryParser("SPEC", analyzer); > qp.setDefaultOperator(AND_OPERATOR); > > query = qp.parse(userQuery); > //System.out.println(query.toString()); > > //creates the sort fields > sort = new SDFSort().getSort(sortFlag, sortFileds); > > // Search over multiple indexes > hits = pmsearcher.search(query, sort); > > } > > This is not exactly as my code. But added almost all searching parts. > > Thanks, > Sonu > > > On Mon, Jun 2, 2008 at 6:19 PM, Erick Erickson > wrote: > >> You need to include your code I think. This makes no sense on a quick >> look, so unless we see some code it'll be hard to know whether >> we're looking at anything relevant. >> >> Best >> Erick >> >> On Mon, Jun 2, 2008 at 1:19 AM, Sonu Sudhakar wrote: >> >> > Hi, >> > >> > I have done some more analysis on this issue. I think it is related to >> > lucene's default operator. >> > I am getting excat results, when I sets the default operator as 'OR', >> but >> > facing problem when setting the default operator as 'AND'. >> > >> > The following are the lucene QueryParser outputs for both cases. >> > >> > Query :* TTL:store AND TTL:data OR TTL:variable >> > >> > *1. When lucene default operator is '*OR' >> > >> > *QueryParser output using toString method: * >> > +TTL:store +TTL:data TTL:variable >> > >> > >> > *2. When lucene default operator is '*AND' >> > >> > *QueryParser output using toString method: >> > *+TTL:store TTL:data TTL:variable >> > >> > *The output of second case is confusing me. >> > >> > Could anybody please give me an explanation for this behavior? >> > >> > Thanks, >> > Sonu >> > >> > On Thu, May 29, 2008 at 3:49 PM, Sonu Sudhakar >> wrote: >> > >> > > Erick, >> > > >> > > Thanks for your reply. >> > > >> > > I am working with approximately 1 million documents. They are indexed >> in >> > 4 >> > > servers. Each document has multiple fields. I am using >> > ParallelMultiSearcher >> > > for searching purpose. >> > > >> > > I tried a few queries in the title(TTL) field. >> > > >> > > i started with a simple query without boolean operators. >> > > >> > > *1. TTL:data => 3733 results (all matches had "data" in title)* >> > > >> > > Then I tried a second one with AND operator. >> > > >> > > *2. TTL:data AND TTL:store => 19 results* >> > > >> > > I analyzed the results. the results had both "data" and "store" in the >> > > title. >> > > >> > > *I then tried OR operator* >> > > >> > > *3. TTL:data AND TTL:store OR TTL:variable* >> > > >> > > I got 3,733 results., same as the query TTL:data. >> > > >> > > I even tried giving a meaningless query >> > > >> > > TTL:data AND TTL:storettttt OR TTL:variablettttt => 3,733 results (The >> > > results were same as that of TTL:data.) >> > > >> > > TTL:data AND TTL:computer OR TTL:device => 3,733 results (this also >> > showed >> > > the same results as above) >> > > >> > > The same thing repeats for other cases too. The queries below also >> > behaved >> > > the same way. >> > > i.e. - >> > > >> > > 1. TTL:store AND TTL:data OR TTL:variable => 76 results >> > > 2. TTL:store AND TTL:datatttt OR TTL:variabletttt => 76 results >> > > 3. TTL:store AND TTL:computer OR TTL:device => 76 results >> > > >> > > >> > > 1. TTL:variable AND TTL:data OR TTL:store => 1,496 results >> > > 2. TTL:variable AND TTL:datatttt OR TTL:storetttt => 1,496 results >> > > 3. TTL:variable AND TTL:computer OR TTL:device => 1,496 results >> > > >> > > I hope you have a clearer picture of my issue now. >> > > >> > > Thanks, >> > > Sonu >> > > >> > > >> > > On Wed, May 28, 2008 at 7:09 PM, Erick Erickson < >> erickerickson@gmail.com >> > > >> > > wrote: >> > > >> > >> It's unclear what you *should* expect. How is your data >> > >> distributed? >> > >> >> > >> In other words, how many documents do you have? In this example, >> > >> for instance, >> > >> 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results >> > >> it considered the TTL:data part only. >> > >> >> > >> it's perfecily reasonable if every document that had "variable" in >> the >> > >> field *also* has "data" and "store" in the field. So your numbers >> > >> don't give us much to work with..... >> > >> >> > >> Remember, though, that Lucene syntax isn't a pure boolean syntax. See >> > >> >> > >> http://wiki.apache.org/lucene-java/BooleanQuerySyntax >> > >> >> > >> And when in doubt parenthesize ... >> > >> >> > >> Best >> > >> Erick >> > >> >> > >> On Wed, May 28, 2008 at 7:44 AM, Sonu Sudhakar >> > wrote: >> > >> >> > >> > Hi, >> > >> > >> > >> > I have some issue with boolean queries. >> > >> > >> > >> > I am using Lucene-core-2.3.1. >> > >> > >> > >> > I have done test on boolean query with 3 terms (data, store, >> variable) >> > >> in >> > >> > my >> > >> > TTL field. The TTL field is indexed and searched using >> > StandardAnalyzer. >> > >> > >> > >> > The three terms when searched individually gave the following >> result >> > >> > >> > >> > 1. TTL:data => 3,733 results >> > >> > 2. TTL:store => 76 results >> > >> > 3. TTL:variable => 1,496 results >> > >> > >> > >> > But found issue when combining these terms with boolean operators. >> > >> > >> > >> > e.g. >> > >> > 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results >> > >> > it considered the TTL:data part only. >> > >> > >> > >> > 2. TTL:store AND TTL:data OR TTL:variable => 76 results >> > >> > it considered the TTL:store part only. >> > >> > >> > >> > 3. TTL:variable AND TTL:data OR TTL:store => 1,496 results >> > >> > it considered the TTL:variable part only. >> > >> > >> > >> > But I am getting correct result when combining terms with 'AND' >> > >> operator. I >> > >> > think the issue is with 'OR' operator. >> > >> > >> > >> > >> > >> > Could anybody give an explanation for this behavior of lucene? >> > >> > Could you give suggestions to rectify this? >> > >> > >> > >> > Thanks, >> > >> > Sonu >> > >> > >> > >> >> > > >> > > >> > >> > > ------=_Part_14222_20795998.1212471041604--