lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sdeck <>
Subject Speed of grouped queries
Date Tue, 02 Jan 2007 22:32:37 GMT

Thanks for advanced on any insight on this one.

I have a fairly large query to run, and it takes roughly 20-40 seconds to
complete the way that i have it.
here is the best example I can give.

I have a set of roughly 25K documents indexed

I have queries that get documents matching a particular actor.

Then, I have a movie query that takes all of the documents found for each
actor query and combines them all together to say, here are all documents
that are relevant for this movie.

Then, and here is the time hog, I have a genre query that says, take all
movies and get their results and combine them together into this genre
result set.

The problem is, at indexing time, I do not have a way to say if a document
is a particular genre, or a particular actor, or movie etc.  If I try and
say for the genre query, get all documents and then filter for the queries
for movies and actors, I get heap space memory issues.

The query for collecting a specific actor is around 200-300 milliseconds,
and the movie one, that actually queries each actor, takes roughly 500-700
milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes
500 milliseconds*# of movies

Any ideas on how I could run these queries differently? For a given actor
query, there is about 5-7 boolean query clauses. Just to give some insight.

I currently just create 1 HitSetCollector (I rolled my own bitsetcollector)
and just run searches with it.  I just get crapped on when it does that
genre search. I wish there was an easier way to aggregate all of those
documents together from all of those searches.  After it is done, I cache
the results, but the initial hit is bad.

Any help would be much appreciated.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message