lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Find Me" <>
Subject Re: Speed of grouped queries
Date Wed, 03 Jan 2007 21:09:06 GMT
On 1/2/07, sdeck <> wrote:
> Thanks for advanced on any insight on this one.
> I have a fairly large query to run, and it takes roughly 20-40 seconds to
> complete the way that i have it.
> here is the best example I can give.
> I have a set of roughly 25K documents indexed
> I have queries that get documents matching a particular actor.
> Then, I have a movie query that takes all of the documents found for each
> actor query and combines them all together to say, here are all documents
> that are relevant for this movie.
> Then, and here is the time hog, I have a genre query that says, take all
> movies and get their results and combine them together into this genre
> result set.

Is there any possibility to use Carrot clustering for genre? Could you
please give examples for the final complex query as well as individual
simple queries?  You can also state the aim of the query. Are you trying to
get clustered list of movies (based on genre) for a particular actor?

--Rajesh Munavalli

The problem is, at indexing time, I do not have a way to say if a document
> is a particular genre, or a particular actor, or movie etc.  If I try and
> say for the genre query, get all documents and then filter for the queries
> for movies and actors, I get heap space memory issues.
> The query for collecting a specific actor is around 200-300 milliseconds,
> and the movie one, that actually queries each actor, takes roughly 500-700
> milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes
> 500 milliseconds*# of movies
> Any ideas on how I could run these queries differently? For a given actor
> query, there is about 5-7 boolean query clauses. Just to give some
> insight.
> I currently just create 1 HitSetCollector (I rolled my own
> bitsetcollector)
> and just run searches with it.  I just get crapped on when it does that
> genre search. I wish there was an easier way to aggregate all of those
> documents together from all of those searches.  After it is done, I cache
> the results, but the initial hit is bad.
> Any help would be much appreciated.
> Sdeck
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message