Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71129 invoked from network); 7 Jul 2009 09:40:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Jul 2009 09:40:19 -0000 Received: (qmail 44075 invoked by uid 500); 7 Jul 2009 09:40:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 44000 invoked by uid 500); 7 Jul 2009 09:40:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43990 invoked by uid 99); 7 Jul 2009 09:40:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jul 2009 09:40:27 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.82.104.177] (HELO web24704.mail.ird.yahoo.com) (212.82.104.177) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 07 Jul 2009 09:40:17 +0000 Received: (qmail 57315 invoked by uid 60001); 7 Jul 2009 09:39:55 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1246959595; bh=iq9j9KvhGfYTE2C5YlgnIEci36C+zWHbGL2LDHV5v7s=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=lhMP7+M5+b8xG945iKPqYfkBo4LGwhYO+cDha3d2dyRD9vXfekgyLG+r2BEG3QkmvoZ2TC2gvEoSY00jk2XM5bnlaY8rthfmg4QicL30tXWsJ3zlpzxvSCXE6Fw9enIPWJt6+NRd3aLZAsjkMIh2CTAd8Ayarfu4t+62pl8M5sY= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=QgKih3TH+HrrlISn6XwKteuCNTjvHruY/kbJf2pS25pSlGdtQ1KpcxoDnRG0aXeuWXkYYHiGIXneZeV0aVb077aK1+r3rRpWa8iQCmL61BO+32TbLYtVYeEAz1NAXml/Sm3ylz/3G36GF9WQJxL2XtuJD/oDTbwdYcXcYHXJ5aQ=; Message-ID: <519838.55818.qm@web24704.mail.ird.yahoo.com> X-YMail-OSG: XhZcCxgVM1nHzkVS8VRkgWZkZZYkdgtkmXF6K3nGtNW5KCdj7Pyt2FvHM3Dytoo3yY92ej0TiE1D5YzmsYV6EGHo9ASy0lY0hdZTEiwBPmrz9V1fZ91RRPgy.we58v8LhObEQxMnpFXLVRIGqUKJ4e.kgJo0o.Fu17Ph51lFpzdxCTrA_IqlB.UGrKxDY7HZNWwx_DWrJ4blVTJrBE6pOln2zTmsMQ0AELwUqVCNWia7dXkN6biYGefU0YUvFuTys61b54U4f2TNFOOlCCszqDWo1BosLlLku534C8v5KcMqIMiZEnaNqZMrkU2d7RZS8bEbdo_vVsiddHrwT0pYZ.ilRYDH3nwz2RjNe91v Received: from [193.36.230.96] by web24704.mail.ird.yahoo.com via HTTP; Tue, 07 Jul 2009 09:39:54 GMT X-Mailer: YahooMailRC/1357.22 YahooMailWebService/0.7.289.15 References: <322510.28299.qm@web24702.mail.ird.yahoo.com> <86faf560907070153g220ba978ide2944325c5c9e8b@mail.gmail.com> Date: Tue, 7 Jul 2009 09:39:54 +0000 (GMT) From: mark harwood Subject: Re: Boolean retrieval To: java-user@lucene.apache.org In-Reply-To: <86faf560907070153g220ba978ide2944325c5c9e8b@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Seems a long-winded way of producing a BooleanFilter but I guess you are tr= ying to work with user input in the form of query strings.=0A=0AThe bug in = your code is that clause.getQuery().getString() is not producing terms that= are in your index - the first call to getTermsFilter passes the string "+= f1:aaa +f1:bbb" which is not a term in the index.=0A=0AGiven the requiremen= t is to ignore scoring I would recommend (as someone else suggested) lookin= g at the IndexSearch.search method that takes a HitCollector and simply acc= umulate all results, regardless of score.=0A=0A=0A=0A=0A=0A----- Original M= essage ----=0AFrom: Lukas Michelbacher =0ATo: java-= user@lucene.apache.org=0ASent: Tuesday, 7 July, 2009 9:53:24=0ASubject: Re:= Boolean retrieval=0A=0ATo test my Boolean queries, I have a small test col= lection where each document=0Acontains one of 1024 possible combinations of= the strings "aaa", "bbb",=0A... "jjj". I tried wrapping a Boolean query l= ike this (it's based on an=0Aolder post to this list [1])=0A=0A=0Aprivate s= tatic TermsFilter getTermsFilter(String field, String text) {=0A TermsFilt= er tf =3D new TermsFilter();=0A tf.addTerm(new Term(field, text));=0A ret= urn tf;=0A}=0A=0AQuery q =3D new QueryParser("f1", new StandardAnalyzer()).= parse("(aaa=0AAND bbb) OR ccc");=0AIndexSearcher searcher =3D new IndexSear= cher(indexDir);=0ATopDocCollector collector =3D new TopDocCollector(1024);= =0A=0ABooleanQuery bc =3D (BooleanQuery) q;=0ABooleanFilter finalFilter =3D= new BooleanFilter();=0ABooleanFilter boolFilt =3D new BooleanFilter();=0A= =0A// add each clause of the original query to the filter=0Afor (BooleanCla= use clause : bc.getClauses()) {=0A boolFilt.add(new FilterClause(getTermsF= ilter("f1",=0Aclause.getQuery().toString()), clause.getOccur()));=0A Syste= m.out.println(clause.getQuery().toString());=0A}=0A=0AfinalFilter.add(new F= ilterClause(boolFilt, BooleanClause.Occur.MUST));=0A=0AConstantScoreQuery c= sq =3D new ConstantScoreQuery(finalFilter);=0Asearcher.search(csq, finalFil= ter, collector);=0A=0AScoreDoc[] hits =3D collector.topDocs().scoreDocs;=0A= System.out.println("Found " + collector.getTotalHits() + " hits");=0A=0AThe= result is 0 hits (should be 640).=0A=0A[1] tinyurl.com/ml52ye=0A=0A2009/7/= 4 Mark Harwood :=0A>=0A> Check out booleanfilter i= n contrib/queries. It can be wrapped in a constantScoreQuery=0A>=0A>=0A>=0A= > On 4 Jul 2009, at 17:37, Lukas Michelbacher wrote:=0A>=0A>=0A> This is about an experiment comparing plain Boolean r= etrieval with=0A> vector-space-based retrieval.=0A>=0A> I would like to dis= able all of Lucene's scoring mechanisms and just=0A> run a true Boolean que= ry that returns exactly the documents that match a=0A> query specified in B= oolean syntax (OR, AND, NOT). No scoring or sorting=0A> required.=0A>=0A> A= s far as I can see, this is not supported out of the box. Which classes=0A= > would I have to modify?=0A>=0A> Would it be enough to create a subclass o= f Similarity and to ignore all terms but one (coord, say) and make this ter= m return 1 if the query matches the document and 0 otherwise?=0A>=0A> Lukas= =0A>=0A> --=0A> Lukas Michelbacher=0A> Institute for Natural Language Proce= ssing=0A> Universit=E4t Stuttgart=0A> email: michells@ims.uni-stuttgart.de= =0A=0A---------------------------------------------------------------------= =0ATo unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org=0AFor ad= ditional commands, e-mail: java-user-help@lucene.apache.org=0A=0A=0A --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org