lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Gilbert" <TIM.GILB...@morningstar.com>
Subject RE: keeping data consistent between Database and Solr
Date Tue, 15 Mar 2011 13:25:30 GMT
I use Solr + MySql with data coming from several DHI type "loaders" that
I have written to move data from many different databases into my "BI"
solution.  I don't use DHI because I am not simply replicating the data,
but I am moving/merging/processing the incoming data during the loading.

For me, I have an Aspect (aspectj) which wraps my Data Access Object and
every time a "persist" is called (I am using hibernate), I update Solr
with the same data an instant later using @Around advice.  This handles
nearly every event during the day.  I have a simple "retry" procedure on
my Solrj add/commit on network error in hopes that it will eventually
work.

In case of error I rebuild the solr index from scratch each night by
recreating it based on the data in MySQL.  That takes about 10 minutes
and I run it at night.  This allows for me to have "eventual
consistency" for any issues that cropped up during the day. 

Obviously the size of my database (< 2 million records) makes this
approach manageable.  YMMV.

Tim

-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: Tuesday, March 15, 2011 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: keeping data consistent between Database and Solr

On 3/14/2011 9:38 PM, onlinespending@gmail.com wrote:
> But my main question is, how do I guarantee that data between my
Cassandra
> database and Solr index are consistent and up-to-date?

Our MySQL database has two unique indexes.  One is a document ID, 
implemented in MySQL as an autoincrement integer and in Solr as a long.

The other is what we call a tag id, implemented in MySQL as a varchar 
and Solr as a single lowercased token and serving as Solr's uniqueKey.  
We have an update trigger on the database that updates the document ID 
whenever the database document is updated.

We have a homegrown build system for Solr.  In a nutshell, it keeps 
track of the newest document ID in the Solr Index.  If the DIH 
delta-import fails, it doesn't update the stored ID, which means that on

the next run, it will try and index those documents again.  Changes to 
the entries in the database are automatically picked up because the 
document ID is newer, but the tag id doesn't change, so the document in 
Solr is overwritten.

Things are actually more complex than I've written, because our index is

distributed.  Hopefully it can give you some ideas for yours.

Shawn


Mime
View raw message