Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates
 209.85.210.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <F7676F21-7D9E-4407-8D67-44FEF705AAD1@robustlinks.com>
References: <4BEEFF76-4FD3-499D-B5F4-632EBE89A6CD@robustlinks.com>
 <CAEY5pxX4k+CNBZmxVwsaCpiLiQjuEAzYoAR50gr6YQtvx93yBQ@mail.gmail.com>
 <F7676F21-7D9E-4407-8D67-44FEF705AAD1@robustlinks.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Thu, 22 Sep 2011 09:59:29 +0100
Message-ID: 
 <CAEY5pxV_H+1x5ghA9mJjMHiEc3bz-SeN8E88nfg9d0SLwPFoJQ@mail.gmail.com>
Subject: Re: Problem with BooleanQuery
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

> I am not analyzing the title
>
> Field titleField =3D new Field("title", article.getTitle(),Field.Store.YE=
S, Field.Index.NOT_ANALYZED);

OK.  But the output you quote says "no match on required clause
(title:List of newspapers in New York)" so something is out of synch
somewhere.

What does Luke show? See
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC=
8_incorrect_hits.3F
for more things to check.

> Do you think booleanquery is the right approach for solving the problem (=
finding lucene score of a word or a phrase in _a_ particular document)?

Sounds OK to me.  You could look at the contrib MemoryIndex as a
possible alternative.


--
Ian.


> On Sep 21, 2011, at 1:00 PM, Ian Lea wrote:
>
>> How is the "title" field indexed? =A0Seems likely it is analyzed in
>> which case a TermQuery won't match because "list of newspapers in New
>> York" would be analyzed into terms "list", "newspapers", "new", "york"
>> assuming things were lowercased, stop words removed etc.
>>
>> Maybe you need your "word" as TermQuery, assuming it is lowercased
>> etc., and pass the title through query parser. =A0In other words,
>> reverse what you've got for the two fields.
>>
>> As for performance, first narrow down where it is taking the time. =A0If
>> it is in lucene, read
>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>>
>>
>> --
>> Ian.
>>
>> On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <peyman@robustlinks.com>=
 wrote:
>>> Hi
>>>
>>> The problem I would like to solve is determining the lucene score of a =
word in _a particular_ given document. The 2 candidates i have been trying =
are
>>>
>>> - QueryWrapperFilter
>>> - BooleanQuery
>>>
>>> Both are to restrict search within a search space. But according to Dou=
g Cutting =A0QueryWrapperFilter option is less preferable than Boolean Quer=
y. However, I am experiencing both performance (very slow) and response pro=
blems (query is not matched to any doc).
>>>
>>> The setup is as follows. Given a user query "word":
>>>
>>> QueryParser parser =3D new QueryParser(Version.LUCENE_32, "content",new=
 StandardAnalyzer(Version.LUCENE_32));
>>> Query query =3D parser.parse(word);
>>> Document d =3D WikiIndexSearcher.doc(match.doc);
>>> docTitle =3D d.get("title");
>>> TermQuery titleQuery =3D new TermQuery(new Term("title", docTitle));
>>> BooleanQuery bQuery =3D new BooleanQuery();
>>> bQuery.add(titleQuery, BooleanClause.Occur.MUST);
>>> bQuery.add(query, BooleanClause.Occur.MUST);
>>> TopDocs hits =3D WikiIndexSearcher.search(bQuery, 1);
>>>
>>> In other words, find a wikipedia doc with a particular title (in exampl=
e below it is "list of newspapers in New York http://en.wikipedia.org/wiki/=
List_of_newspapers_in_New_York"). We then create a boolean term query with =
that must match on the title and content must match the user query ('americ=
an' in the example below).
>>>
>>> Here is the output of a run on user query "american" in a doc with titl=
e "list of newspapers in New York").
>>>
>>> ... QUERY: content:american
>>> ... doc: List of newspapers in New York
>>> ... query: +title:List of newspapers in New York +content:american
>>> ... explanation 568744: 0.0 =3D (NON-MATCH) Failure to meet condition(s=
) of required/prohibited clause(s)
>>> =A00.0 =3D no match on required clause (title:List of newspapers in New=
 York)
>>> =A00.011818626 =3D (MATCH) weight(content:american in 212081), product =
of:
>>> =A0 =A00.15625292 =3D queryWeight(content:american), product of:
>>> =A0 =A0 =A02.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450)
>>> =A0 =A0 =A00.0645564 =3D queryNorm
>>> =A0 =A00.075637795 =3D (MATCH) fieldWeight(content:american in 212081),=
 product of:
>>> =A0 =A0 =A01.0 =3D tf(termFreq(content:american)=3D1)
>>> =A0 =A0 =A02.4204094 =3D idf(docFreq=3D392249, maxDocs=3D1623450)
>>> =A0 =A0 =A00.03125 =3D fieldNorm(field=3Dcontent, doc=3D212081)
>>>
>>> As you can see there is no match to the query (and hits.totalcounts is =
0). The search is very slow too.
>>>
>>> Any help would be much appreciated
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org