lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Reusing Query instances
Date Sat, 30 Apr 2011 07:45:24 GMT
Hi Otis,

> Is there any reason why one would *not* want to reuse Query instances?

Definitely not!
> I'm using MemoryIndex with a fixed set of queries and I'm executing them
> on each new document that comes in.  Because each document needs to
> have many tens of thousands of queries executed against it, I thought I'd
> run all queries through QueryParser once at the beginning, and then just
> reuse Query instances on each incoming document.  What I've noticed is
> my fixed set of queries takes longer and longer to execute as time passes
> (more and more time is spent inside somewhere).
> The problem is not heap/memory - there is no crazy GCing and the heap is
> not full, but the CPU is 100% busy.

You should still generate some dumps when its gets slow.

In general, reusing queries is perfectly fine, as the queries itself are
only a hull for the query parameters and factories for new rewritten queries
(if needed) and factories for Weights/Scorers. Of course, you should not
reuse rewritten queries, as they largely depend on the underlying index
(which changes on each request).

> I should note that queries I'm dealing with are ugly and big, using lots
> wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no
> Wildcard impl).
> I should also emphasize that at this point I only *suspect* that maaaybe
> gradual slowdown I'm seeing has something to do with the fact that I'm
> reusing Query instances.

Did this somehow change with 3.1 or was this the same in 3.0? In fact for
each query execution, a BitSet is allocated per segment, but as you use
MemoryIndex, the BitSet is one slot *g* (so its not an issue). For memory
index, it's more important that the term dictionary / positions is optimized
so PhraseQueries and Wildcard queries can quickly execute on the term index.
As said before, the queries from query parser are only used to rewrite
against, producing index, specific queries. The reuse pattern is ok and

Some other question: Can you temporary replace memoryindex by another simple
one-doc impl (RAMDirectory), just to test if it also slows down then? I
don't like MemoryIndex at all (I know, it was not the bad guy for your stack


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message