lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sdeck <scott.dec...@gmail.com>
Subject Re: Speed of grouped queries
Date Wed, 03 Jan 2007 16:59:02 GMT


Yes, indeed. I have tried each of those, hence my frustration.  
So, the max clause one did not seem to work, I ran out of heap memory for
some reason. I have my heap top set to -Xmx1024m
so, that should be enough.
Tried the query filter (that one also caused a heap memory error)

yeah, I have thought about pre-indexing some fields. Basically running the
queries I have now and inserting the actor, movie and genred ids, however,
what if my queries need to be tweaked, or they change completely? I now have
all of these stored documents with possibly incorrect matches.
That index would have to be created anytime I update the current index, plus
change my queries, which may be what I have to do eventually. I was just
trying to fix it from the query side first  and if that failed, then go the
secondary (almost inverted) index route.

I guess, any ideas why I would run out of heap memory by combining all of
those boolean queries together and then running the query? What is happening
in the background that would make that occur? Is it storing something in
memory, like all of the common terms or something, to cause that to occur?

Sdeck



Steven Rowe wrote:
> 
> Hi Scott,
> 
> sdeck wrote:
>> I can't combine each of the movie queries together into one, because I
>> get a
>> memory error because of how many clauses there are (setting the clause
>> higher did not help)
> 
> Have you tried increasing the memory available to the JVM?  Sun's JVM
> takes an option "-Xmx" to change the maximum amount of heap space to use
> (defaults to 64MB). For Java 1.5, see
> <http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/java.html#Xms> for
> Windows or
> <http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html#Xms> for
> Solaris and Linux.
> 
> You may have to increase the maximum # of allowed clauses too (sounds
> like you're already aware of this one):
> <http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)>
> 
> If this doesn't help, you may want to look into QueryFilter
> <http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/QueryFilter.html>.
>  You might try using a ChainedFilter (from the Lucene Sandbox - note the
> latest release of this class is not located in lucene-core-2.0.0.jar,
> but rather in lucene-misc-2.0.0.jar)
> <http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/misc/ChainedFilter.html>
> to connect movie QueryFilters for a genre.
> 
> To improve performance (beyond the first query execution), you could
> wrap the individual QueryFilters in CachingWrapperFilters
> <http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/CachingWrapperFilter.html>.
> 
> For something completely different, since you seem to be interested in
> online query performance, you could run all possible queries offline,
> and use the results to construct a derived index, in which documents
> contain "actor", "movie" and "genre" fields.  This derived index would
> be plenty fast, I expect.  And if running all possible genre queries is
> too resource-intensive, then you could compromise and construct your
> derived index with just an "actor" field, or both an "actor" and a
> "movie" field.
> 
> In any case, it sounds like the # of documents in your index is fairly
> small -- have you tried using RAMDirectory
> <http://lucene.apache.org/java/docs/api/org/apache/lucene/store/RAMDirectory.html>?
> 
> 
> Hope it helps,
> Steve
> 
>> Steven Rowe wrote:
>>> Hi Sdeck,
>>>
>>> sdeck wrote:
>>>> The query for collecting a specific actor is around 200-300
>>>> milliseconds,
>>>> and the movie one, that actually queries each actor, takes roughly
>>>> 500-700
>>>> milliseconds. Yet, for a genre, where you may have 50-100 movies, it
>>>> takes
>>>> 500 milliseconds*# of movies
>>> I'm having trouble visualizing both what your documents and your queries
>>> look like.  Can you please provide more concrete information?
>>> Sometimes, actual code helps.
>>>
>>> For example, how do actors, movies and genres relate to your documents?
>>>  Do you have some external source(s) of information (i.e. external to
>>> your Lucene index) that relate actors to movies?  And movies to genres?
>>>
>>> If actors, movies and genres are supposed to be a metaphor for what
>>> you're "really" representing, then you'll have to extend your metaphor a
>>> little bit to make sense (for "me" anyway) of what you're trying to
>>> "do".
>>>
>>> Steve
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Speed-of-grouped-queries-tf2910499.html#a8144324
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message