lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tcorbet" <tcor...@ix.netcom.com>
Subject BooleanQuery
Date Tue, 01 Nov 2005 06:43:32 GMT
I have an index over the titles to .mp3 songs.
It is not unreasonable for the user to want to
see the results from:  "Show me Everything".

I understand that title:* is not a valid wildcard query.
I understand that title:[a* TO z*] is a valid wildcard query.

What I cannot understand is this behavior which
throws no exceptions:

title:[a* TO z*] returns 0 hits.

title [a* TO m*] OR [n* TO z*] returns *almost* the
correct answer -- one title [of approximately 1200] is missing.

title:[a* TO m*] OR [m* TO z*] correctly returns
all the available titles.

I arrived at  this 'bifurcation' solution because trial
and error indicated that the problem was somehow
related to the number of terms involved.

In addition to doing the bifurcation, I also setMaxClauseCount
up to 4096 from 1024, but that did not change
the observed behavior; it is still necessary to
present the Search engine with two smaller Ranges
in lieu of a single large Range to get correct
results.  [Note, just for completeness, I have
tested and confirmed that almost any bifurcation
works.  I can go from a-k and from k-z, just
as well as the example shown.  If, however, I
try some very small range coupled with some
longer range, such as a-t and t-z, it will
revert to returning 0 hits.]  [Another note, you
can safely assume that the distribution of words in 
the titles of the music is not dramatically different
from the distribution of words you would find
in domains like email subjects, html page titles
or books from Amazon lists.]

There is obviously something *interesting* about the
behavior of the Search engine that I have failed to
grasp.  I would appreciate any instruction.

Thank you.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message