lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Caching analyzed query
Date Thu, 03 Dec 2009 07:19:28 GMT
What kind of queries are these?  I.e. How much work goes into step 4?  Is
this
a fairly standard combination of Boolean/Phrase/other stock Lucene queries
built
up out of tokenizing the text?

If so, it's going to be nowhere near the bottleneck in your runtime (we're
talking
often way less than a millisecond per query), and so you can save this to
the
last minute.  Doing the key/value store lookup (especially if remote!) I can
see
caching, but producing the Lucene query is only slow if you're doing some
*really* crazy stuff.  Sometimes happens, to be sure, but usually, the
crazier
your query gets, the slower it gets (step 5), and so even in this case, the
bottleneck
is not in the Query object creation.  It happens once per query.  Step 5 has
lots of
steps which happen once per *document* which matches the query (inner loop
versus before the loop starts).

  -jake

On Wed, Dec 2, 2009 at 5:43 PM, Erdinc Yilmazel <erdinc@yilmazel.com> wrote:

> Hi,
>
> In my application certain kind of queries for certain kinds of inputs will
> be repeated on the lucene index. The application flow is something like
> this:
>
>   1. Get input A
>   2. Lookup a key/value store for key A
>   3. Load a text from key value store to be used as a query
>   4. Analyze the text and build a Query object
>   5. Perform a search
>
> What I want to do is to implement a cache for the steps 2, 3 and 4. I don't
> want to analyze the query text again and again. Think of this as a
> distributed application, running on several servers. What is the best way
> to
> cache analyzed version of the input text? I can make a cache per JVM by
> holding a previously created Query object for a specific input, but in a
> distributed environment if I store the serialized form of Query object, the
> overhead of deserializing may kill all the benefits of caching here...
>
> Thanks,
> Erdinc
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message