lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Faratin <pey...@robustlinks.com>
Subject Problem with BooleanQuery
Date Wed, 21 Sep 2011 16:38:09 GMT
Hi

The problem I would like to solve is determining the lucene score of a word in _a particular_
given document. The 2 candidates i have been trying are

- QueryWrapperFilter
- BooleanQuery

Both are to restrict search within a search space. But according to Doug Cutting  QueryWrapperFilter
option is less preferable than Boolean Query. However, I am experiencing both performance
(very slow) and response problems (query is not matched to any doc). 

The setup is as follows. Given a user query "word":

QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new StandardAnalyzer(Version.LUCENE_32));
Query query = parser.parse(word);
Document d = WikiIndexSearcher.doc(match.doc);
docTitle = d.get("title");
TermQuery titleQuery = new TermQuery(new Term("title", docTitle));
BooleanQuery bQuery = new BooleanQuery();
bQuery.add(titleQuery, BooleanClause.Occur.MUST);
bQuery.add(query, BooleanClause.Occur.MUST);
TopDocs hits = WikiIndexSearcher.search(bQuery, 1);

In other words, find a wikipedia doc with a particular title (in example below it is "list
of newspapers in New York http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York"). We
then create a boolean term query with that must match on the title and content must match
the user query ('american' in the example below). 

Here is the output of a run on user query "american" in a doc with title "list of newspapers
in New York").

... QUERY: content:american
... doc: List of newspapers in New York
... query: +title:List of newspapers in New York +content:american
... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
clause(s)
  0.0 = no match on required clause (title:List of newspapers in New York)
  0.011818626 = (MATCH) weight(content:american in 212081), product of:
    0.15625292 = queryWeight(content:american), product of:
      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
      0.0645564 = queryNorm
    0.075637795 = (MATCH) fieldWeight(content:american in 212081), product of:
      1.0 = tf(termFreq(content:american)=1)
      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
      0.03125 = fieldNorm(field=content, doc=212081)

As you can see there is no match to the query (and hits.totalcounts is 0). The search is very
slow too. 

Any help would be much appreciated
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message