Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E90469F7E for ; Wed, 21 Sep 2011 16:38:49 +0000 (UTC) Received: (qmail 66597 invoked by uid 500); 21 Sep 2011 16:38:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 66549 invoked by uid 500); 21 Sep 2011 16:38:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 66541 invoked by uid 99); 21 Sep 2011 16:38:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 16:38:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 16:38:41 +0000 Received: by wyg10 with SMTP id 10so2674125wyg.35 for ; Wed, 21 Sep 2011 09:38:19 -0700 (PDT) Received: by 10.227.201.137 with SMTP id fa9mr983180wbb.54.1316623098345; Wed, 21 Sep 2011 09:38:18 -0700 (PDT) Received: from [94.203.49.38] ([94.203.49.38]) by mx.google.com with ESMTPS id y10sm7528060wbm.14.2011.09.21.09.38.12 (version=SSLv3 cipher=OTHER); Wed, 21 Sep 2011 09:38:16 -0700 (PDT) From: Peyman Faratin Content-Type: multipart/alternative; boundary="Apple-Mail=_E4EF1B7C-E7E8-4602-A071-E53A97D57201" Subject: Problem with BooleanQuery Date: Wed, 21 Sep 2011 12:38:09 -0400 Message-Id: <4BEEFF76-4FD3-499D-B5F4-632EBE89A6CD@robustlinks.com> To: java-user@lucene.apache.org Mime-Version: 1.0 (Apple Message framework v1244.3) X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_E4EF1B7C-E7E8-4602-A071-E53A97D57201 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi The problem I would like to solve is determining the lucene score of a = word in _a particular_ given document. The 2 candidates i have been = trying are - QueryWrapperFilter - BooleanQuery Both are to restrict search within a search space. But according to Doug = Cutting QueryWrapperFilter option is less preferable than Boolean = Query. However, I am experiencing both performance (very slow) and = response problems (query is not matched to any doc).=20 The setup is as follows. Given a user query "word": QueryParser parser =3D new QueryParser(Version.LUCENE_32, "content",new = StandardAnalyzer(Version.LUCENE_32)); Query query =3D parser.parse(word); Document d =3D WikiIndexSearcher.doc(match.doc); docTitle =3D d.get("title"); TermQuery titleQuery =3D new TermQuery(new Term("title", docTitle)); BooleanQuery bQuery =3D new BooleanQuery(); bQuery.add(titleQuery, BooleanClause.Occur.MUST); bQuery.add(query, BooleanClause.Occur.MUST); TopDocs hits =3D WikiIndexSearcher.search(bQuery, 1); In other words, find a wikipedia doc with a particular title (in example = below it is "list of newspapers in New York = http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York"). We then = create a boolean term query with that must match on the title and = content must match the user query ('american' in the example below).=20 Here is the output of a run on user query "american" in a doc with title = "list of newspapers in New York"). ... QUERY: content:american ... doc: List of newspapers in New York ... query: +title:List of newspapers in New York +content:american ... explanation 568744: 0.0 =3D (NON-MATCH) Failure to meet condition(s) = of required/prohibited clause(s) 0.0 =3D no match on required clause (title:List of newspapers in New = York) 0.011818626 =3D (MATCH) weight(content:american in 212081), product = of: 0.15625292 =3D queryWeight(content:american), product of: 2.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450) 0.0645564 =3D queryNorm 0.075637795 =3D (MATCH) fieldWeight(content:american in 212081), = product of: 1.0 =3D tf(termFreq(content:american)=3D1) 2.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450) 0.03125 =3D fieldNorm(field=3Dcontent, doc=3D212081) As you can see there is no match to the query (and hits.totalcounts is = 0). The search is very slow too.=20 Any help would be much appreciated= --Apple-Mail=_E4EF1B7C-E7E8-4602-A071-E53A97D57201--