lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <>
Subject Re: Syncing lucene index with a database
Date Fri, 27 Mar 2009 07:52:43 GMT

I was going to suggest looking at hibernate search. It comes with  
event listeners that modify your indexes when the persistent entity  
changes. It use lucene under the hood so if you need to access lucene  
the you can.

Indexing can be done sync or async and the documentation shows how to  
set jms.

There other benefits of hibernate search which you find on the site  
and documentation.


On 27 Mar 2009, at 00:03, Tim Williams <> wrote:

> On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder  
> <> wrote:
>> I'm new to Lucene and just beginning my project of adding it to our  
>> web
>> app.  We are indexing data from a MS SQL 2000 database and building
>> full-text search from it.
>> Everything I have read says that building the index is a resource  
>> heavy
>> operation so we should use it sparingly.  For the most part the  
>> database
>> table we are working from is updated once a day so as soon as the  
>> table
>> itself is updated we can rebuild our Lucene indexes.  However,  
>> there are
>> a few feilds that get updated with a cronjob every 15 minutes.  In  
>> terms
>> of speed and efficiency, what would be a better system for keeping  
>> our
>> data synced between the database and Lucene?
>> Of course one option would be to rebuild the Lucene index each time  
>> the
>> cronjob runs to keep the database and Lucene index synced.  We could
>> either return the entire database table, loop through the rows, get a
>> row's document in lucene remove/readd it, and do that for each row.
>> Alternatively after we update the main table we return just the rows
>> that were changed, loop through those and remove/readd them in  
>> lucene,
>> and do that for just the rows that have changed.
>> Alternatively I have thought of using Lucene purely for search to
>> return just the primary key of items from our database table, then  
>> query
>> the database for those items and get the most up to date data from  
>> the
>> database to actually display our search results.  This would let us  
>> use
>> Lucene's superior searching capabilities and searching speed, but  
>> would
>> still require us to pull the data to be displayed from the database.
>> Another option is that we could do the same, but only return the  
>> fields
>> that could change frequently.  This would use Lucene to store and  
>> index
>> the majority of what is displayed on a search results page, only  
>> using
>> the database to return the 2 or 3 fields that might change in a  
>> search
>> for each row that lucene returns.
>> I'm honestly not sure what the "proper" choice should be, or if it
>> really depends on our own test cases.  Is it perfectly okay to run an
>> index update every 15 minutes? How much difference would it make in
>> terms of search time to search with lucene AND pull from the  
>> database?
>> My main issue with searching with lucene but getting the actual data
>> from the database is that it seems like that would make our current
>> search system that is entirely database driven to run slower.
> Not sure what ORM framework, if any, you might be using, but some
> colleagues have had some success using Hibernate Search[1] for this
> sorta thing.  I've not used it, just a pointer in case you haven't
> come across it...  seems that it would keep you above some low-level
> details if it fits...
> --tim
> [1] -
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message