Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D2BB777C for ; Thu, 22 Sep 2011 09:00:19 +0000 (UTC) Received: (qmail 95882 invoked by uid 500); 22 Sep 2011 09:00:16 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95823 invoked by uid 500); 22 Sep 2011 09:00:16 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95815 invoked by uid 99); 22 Sep 2011 09:00:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2011 09:00:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2011 09:00:09 +0000 Received: by iabz7 with SMTP id z7so4080030iab.35 for ; Thu, 22 Sep 2011 01:59:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=RpZLaCmLNcdafwtB7qBnSGXBiOOFCMPzSeDUKalj6Vs=; b=IRVUyOq7SkslXkGscMhUN+9OAJynhp41VXUwrRr4132+LSaxk0PCHCIFvjRz39j62B BK+KompinbSzTquwKN4wppGLIfHQXGZYOdo+09rfHL/aWx7EH/jLkewGzUG+alkWchSH Ub5hZ/laOhShjiZrEwEheo4kNW8NRG8k9dGC0= Received: by 10.231.8.18 with SMTP id f18mr3396635ibf.6.1316681989091; Thu, 22 Sep 2011 01:59:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.104.130 with HTTP; Thu, 22 Sep 2011 01:59:29 -0700 (PDT) In-Reply-To: References: <4BEEFF76-4FD3-499D-B5F4-632EBE89A6CD@robustlinks.com> From: Ian Lea Date: Thu, 22 Sep 2011 09:59:29 +0100 Message-ID: Subject: Re: Problem with BooleanQuery To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > I am not analyzing the title > > Field titleField =3D new Field("title", article.getTitle(),Field.Store.YE= S, Field.Index.NOT_ANALYZED); OK. But the output you quote says "no match on required clause (title:List of newspapers in New York)" so something is out of synch somewhere. What does Luke show? See http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC= 8_incorrect_hits.3F for more things to check. > Do you think booleanquery is the right approach for solving the problem (= finding lucene score of a word or a phrase in _a_ particular document)? Sounds OK to me. You could look at the contrib MemoryIndex as a possible alternative. -- Ian. > On Sep 21, 2011, at 1:00 PM, Ian Lea wrote: > >> How is the "title" field indexed? =A0Seems likely it is analyzed in >> which case a TermQuery won't match because "list of newspapers in New >> York" would be analyzed into terms "list", "newspapers", "new", "york" >> assuming things were lowercased, stop words removed etc. >> >> Maybe you need your "word" as TermQuery, assuming it is lowercased >> etc., and pass the title through query parser. =A0In other words, >> reverse what you've got for the two fields. >> >> As for performance, first narrow down where it is taking the time. =A0If >> it is in lucene, read >> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed >> >> >> -- >> Ian. >> >> On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin = wrote: >>> Hi >>> >>> The problem I would like to solve is determining the lucene score of a = word in _a particular_ given document. The 2 candidates i have been trying = are >>> >>> - QueryWrapperFilter >>> - BooleanQuery >>> >>> Both are to restrict search within a search space. But according to Dou= g Cutting =A0QueryWrapperFilter option is less preferable than Boolean Quer= y. However, I am experiencing both performance (very slow) and response pro= blems (query is not matched to any doc). >>> >>> The setup is as follows. Given a user query "word": >>> >>> QueryParser parser =3D new QueryParser(Version.LUCENE_32, "content",new= StandardAnalyzer(Version.LUCENE_32)); >>> Query query =3D parser.parse(word); >>> Document d =3D WikiIndexSearcher.doc(match.doc); >>> docTitle =3D d.get("title"); >>> TermQuery titleQuery =3D new TermQuery(new Term("title", docTitle)); >>> BooleanQuery bQuery =3D new BooleanQuery(); >>> bQuery.add(titleQuery, BooleanClause.Occur.MUST); >>> bQuery.add(query, BooleanClause.Occur.MUST); >>> TopDocs hits =3D WikiIndexSearcher.search(bQuery, 1); >>> >>> In other words, find a wikipedia doc with a particular title (in exampl= e below it is "list of newspapers in New York http://en.wikipedia.org/wiki/= List_of_newspapers_in_New_York"). We then create a boolean term query with = that must match on the title and content must match the user query ('americ= an' in the example below). >>> >>> Here is the output of a run on user query "american" in a doc with titl= e "list of newspapers in New York"). >>> >>> ... QUERY: content:american >>> ... doc: List of newspapers in New York >>> ... query: +title:List of newspapers in New York +content:american >>> ... explanation 568744: 0.0 =3D (NON-MATCH) Failure to meet condition(s= ) of required/prohibited clause(s) >>> =A00.0 =3D no match on required clause (title:List of newspapers in New= York) >>> =A00.011818626 =3D (MATCH) weight(content:american in 212081), product = of: >>> =A0 =A00.15625292 =3D queryWeight(content:american), product of: >>> =A0 =A0 =A02.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450) >>> =A0 =A0 =A00.0645564 =3D queryNorm >>> =A0 =A00.075637795 =3D (MATCH) fieldWeight(content:american in 212081),= product of: >>> =A0 =A0 =A01.0 =3D tf(termFreq(content:american)=3D1) >>> =A0 =A0 =A02.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450) >>> =A0 =A0 =A00.03125 =3D fieldNorm(field=3Dcontent, doc=3D212081) >>> >>> As you can see there is no match to the query (and hits.totalcounts is = 0). The search is very slow too. >>> >>> Any help would be much appreciated >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org