lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: Speed of grouped queries
Date Wed, 03 Jan 2007 16:50:10 GMT
Hi Scott,

sdeck wrote:
> I can't combine each of the movie queries together into one, because I get a
> memory error because of how many clauses there are (setting the clause
> higher did not help)

Have you tried increasing the memory available to the JVM?  Sun's JVM
takes an option "-Xmx" to change the maximum amount of heap space to use
(defaults to 64MB). For Java 1.5, see
<> for
Windows or
<> for
Solaris and Linux.

You may have to increase the maximum # of allowed clauses too (sounds
like you're already aware of this one):

If this doesn't help, you may want to look into QueryFilter
 You might try using a ChainedFilter (from the Lucene Sandbox - note the
latest release of this class is not located in lucene-core-2.0.0.jar,
but rather in lucene-misc-2.0.0.jar)
to connect movie QueryFilters for a genre.

To improve performance (beyond the first query execution), you could
wrap the individual QueryFilters in CachingWrapperFilters

For something completely different, since you seem to be interested in
online query performance, you could run all possible queries offline,
and use the results to construct a derived index, in which documents
contain "actor", "movie" and "genre" fields.  This derived index would
be plenty fast, I expect.  And if running all possible genre queries is
too resource-intensive, then you could compromise and construct your
derived index with just an "actor" field, or both an "actor" and a
"movie" field.

In any case, it sounds like the # of documents in your index is fairly
small -- have you tried using RAMDirectory

Hope it helps,

> Steven Rowe wrote:
>> Hi Sdeck,
>> sdeck wrote:
>>> The query for collecting a specific actor is around 200-300 milliseconds,
>>> and the movie one, that actually queries each actor, takes roughly
>>> 500-700
>>> milliseconds. Yet, for a genre, where you may have 50-100 movies, it
>>> takes
>>> 500 milliseconds*# of movies
>> I'm having trouble visualizing both what your documents and your queries
>> look like.  Can you please provide more concrete information?
>> Sometimes, actual code helps.
>> For example, how do actors, movies and genres relate to your documents?
>>  Do you have some external source(s) of information (i.e. external to
>> your Lucene index) that relate actors to movies?  And movies to genres?
>> If actors, movies and genres are supposed to be a metaphor for what
>> you're "really" representing, then you'll have to extend your metaphor a
>> little bit to make sense (for "me" anyway) of what you're trying to "do".
>> Steve

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message