lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Stoppelman" <>
Subject Re: hybrid query (lucene + db)
Date Thu, 01 May 2008 18:50:54 GMT

Could you describe how you setup the spatial area? Having BooleanQuery with
200 terms in it definitely slows things down (I'm not sure exactly why yet
-- it seems like it shouldn't be "that" slow). If you can describe your
spatial area in fewer terms you can get much better performance. It just
depends on how you're describing your spatial areas and the number of
results in each zipcode. If you had a field like "city,state" in your index
you would have far less terms in your query than if that query had all the
zipcodes in a "city,state" combo, thus making your query much faster.


On Thu, May 1, 2008 at 2:15 AM, mark harwood <>

> The issue here is a general one of trying to perform an efficient join
> between an external resource (rdbms) and Lucene.
> This experiment may be of interest:
> embodies the core service which translates from lucene doc ids
> to DB primary keys or vice versa.
> There are a couple of implementations of KeyMap that are not optimal (they
> pre-date Lucene's FieldCache) but it may give you food for thought.
> Cheers
> Mark
> ----- Original Message ----
> From: Stephane Nicoll <>
> To:
> Sent: Thursday, 1 May, 2008 9:00:33 AM
> Subject: hybrid query (lucene + db)
> Hi there,
> We're using lucene with Hibernate search and we're very happy so far
> with the performance and the usability of lucene. We have however a
> specific use cases that prevent us to use only lucene: spatial
> queries. I already sent a mail on this list a while back about the
> problem and we started investigating multiple solutions.
> When the user selects a geographic area and some keywords we do the
> following:
> * Perform a search on the lucene index for the keywords with a
> projection that returns only the primaryKey of the element sorted by
> primary key
> * Perform a search on the database with other criterias and a
> projection that returns only the primary key of the elements
> * Iterate on both list to find N matching IDs, optionally with paging
> (some from X to X + N where X is the first result of the page)
> * Run a query on the database to return the actual objects (select a
> from MyClass a where IN (the list of matching IDs) ) We limit the
> page to 1000 results
> We have searched a way to optimize the queries and to avoid to consume
> too much memory, knowing that we must support paging.
> With a single user a search by kewyords takes 30msec to complete, a
> search by box takes 45msec. With both (keywords + spatial area)  it
> takes 300msec
> With 10 concurrent users, a search by keywords takes 150msec/user  but
> for both it takes 3 sec/user !!!
> I had the profiler running on this scenario and I've found that *all*
> threads are waiting on org.apache.lucene.index.SegmentReader. I then
> configured Hibernate Search to use a separate index reader per thread.
> The deadlocks disappeared but it's still very slow (2.8sec).
> Some questions:
> * Does anyone knows where the deadlocks on SegmentReader are coming from?
> * Is the sorting on the primary keys a bad idea regarding performance
> and memory usage?
> * Does anyone has an idea to perform this kind of hybrid query in an
> efficient way?
> I am using lucene 2.3.1 and Hibernate Search 3.0.1. I already ask for
> support on the Hibernate Search forum but did not get any answer so
> far.
> Thanks,
> St├ęphane
> --
> Large Systems Suck: This rule is 100% transitive. If you build one,
> you suck" -- S.Yegge
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
>       __________________________________________________________
> Sent from Yahoo! Mail.
> A Smarter Email
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message