lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Speed of grouped queries
Date Wed, 03 Jan 2007 16:50:10 GMT
Hi Scott,

sdeck wrote:
> I can't combine each of the movie queries together into one, because I get a
> memory error because of how many clauses there are (setting the clause
> higher did not help)

Have you tried increasing the memory available to the JVM?  Sun's JVM
takes an option "-Xmx" to change the maximum amount of heap space to use
(defaults to 64MB). For Java 1.5, see
<http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/java.html#Xms> for
Windows or
<http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html#Xms> for
Solaris and Linux.

You may have to increase the maximum # of allowed clauses too (sounds
like you're already aware of this one):
<http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)>

If this doesn't help, you may want to look into QueryFilter
<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/QueryFilter.html>.
 You might try using a ChainedFilter (from the Lucene Sandbox - note the
latest release of this class is not located in lucene-core-2.0.0.jar,
but rather in lucene-misc-2.0.0.jar)
<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/misc/ChainedFilter.html>
to connect movie QueryFilters for a genre.

To improve performance (beyond the first query execution), you could
wrap the individual QueryFilters in CachingWrapperFilters
<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/CachingWrapperFilter.html>.

For something completely different, since you seem to be interested in
online query performance, you could run all possible queries offline,
and use the results to construct a derived index, in which documents
contain "actor", "movie" and "genre" fields.  This derived index would
be plenty fast, I expect.  And if running all possible genre queries is
too resource-intensive, then you could compromise and construct your
derived index with just an "actor" field, or both an "actor" and a
"movie" field.

In any case, it sounds like the # of documents in your index is fairly
small -- have you tried using RAMDirectory
<http://lucene.apache.org/java/docs/api/org/apache/lucene/store/RAMDirectory.html>?


Hope it helps,
Steve

> Steven Rowe wrote:
>> Hi Sdeck,
>>
>> sdeck wrote:
>>> The query for collecting a specific actor is around 200-300 milliseconds,
>>> and the movie one, that actually queries each actor, takes roughly
>>> 500-700
>>> milliseconds. Yet, for a genre, where you may have 50-100 movies, it
>>> takes
>>> 500 milliseconds*# of movies
>> I'm having trouble visualizing both what your documents and your queries
>> look like.  Can you please provide more concrete information?
>> Sometimes, actual code helps.
>>
>> For example, how do actors, movies and genres relate to your documents?
>>  Do you have some external source(s) of information (i.e. external to
>> your Lucene index) that relate actors to movies?  And movies to genres?
>>
>> If actors, movies and genres are supposed to be a metaphor for what
>> you're "really" representing, then you'll have to extend your metaphor a
>> little bit to make sense (for "me" anyway) of what you're trying to "do".
>>
>> Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message