lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Problem with BooleanQuery
Date Wed, 21 Sep 2011 17:00:13 GMT
How is the "title" field indexed?  Seems likely it is analyzed in
which case a TermQuery won't match because "list of newspapers in New
York" would be analyzed into terms "list", "newspapers", "new", "york"
assuming things were lowercased, stop words removed etc.

Maybe you need your "word" as TermQuery, assuming it is lowercased
etc., and pass the title through query parser.  In other words,
reverse what you've got for the two fields.

As for performance, first narrow down where it is taking the time.  If
it is in lucene, read
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed


--
Ian.

On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <peyman@robustlinks.com> wrote:
> Hi
>
> The problem I would like to solve is determining the lucene score of a word in _a particular_
given document. The 2 candidates i have been trying are
>
> - QueryWrapperFilter
> - BooleanQuery
>
> Both are to restrict search within a search space. But according to Doug Cutting  QueryWrapperFilter
option is less preferable than Boolean Query. However, I am experiencing both performance
(very slow) and response problems (query is not matched to any doc).
>
> The setup is as follows. Given a user query "word":
>
> QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new StandardAnalyzer(Version.LUCENE_32));
> Query query = parser.parse(word);
> Document d = WikiIndexSearcher.doc(match.doc);
> docTitle = d.get("title");
> TermQuery titleQuery = new TermQuery(new Term("title", docTitle));
> BooleanQuery bQuery = new BooleanQuery();
> bQuery.add(titleQuery, BooleanClause.Occur.MUST);
> bQuery.add(query, BooleanClause.Occur.MUST);
> TopDocs hits = WikiIndexSearcher.search(bQuery, 1);
>
> In other words, find a wikipedia doc with a particular title (in example below it is
"list of newspapers in New York http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York").
We then create a boolean term query with that must match on the title and content must match
the user query ('american' in the example below).
>
> Here is the output of a run on user query "american" in a doc with title "list of newspapers
in New York").
>
> ... QUERY: content:american
> ... doc: List of newspapers in New York
> ... query: +title:List of newspapers in New York +content:american
> ... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
clause(s)
>  0.0 = no match on required clause (title:List of newspapers in New York)
>  0.011818626 = (MATCH) weight(content:american in 212081), product of:
>    0.15625292 = queryWeight(content:american), product of:
>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>      0.0645564 = queryNorm
>    0.075637795 = (MATCH) fieldWeight(content:american in 212081), product of:
>      1.0 = tf(termFreq(content:american)=1)
>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>      0.03125 = fieldNorm(field=content, doc=212081)
>
> As you can see there is no match to the query (and hits.totalcounts is 0). The search
is very slow too.
>
> Any help would be much appreciated

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message