lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: Best practice advice needed!
Date Thu, 25 Sep 2008 18:51:31 GMT
This will cause the result counts to be wrong and the "deleted" docs
will stay in the search index forever.

Some approaches for incremental update:

* full sweep garbage collection: fetch every ID in the Solr DB and
check whether that exists in the source DB, then delete the ones
that don't exist.

* mark for deletion: change the DB to leave the record but flag it
as deleted in a boolean row, then delete from Solr all deleted
items in the source DB. The items marked for deletion can be
deleted from the source DB at a later time.

* indexer scratchpad DB: a database used by the indexing code which
shows all the IDs currently in the index, usually with a last modified
time. This is similar to the full sweep, but may be much faster with
a dedicated DB. This can get arbitrarily fancy. Web spiders work like this.

wunder

On 9/25/08 10:08 AM, "Fuad Efendi" <fuad@efendi.ca> wrote:

> I am guessing your Enterprise system deletes/updates tables in RDBMS,
> and your SOLR indexes that data. Additionally to that, you have
> front-end interacting with SOLR and with RDBMS. At front-end level, in
> case of a search sent to SOLR returning primary keys for data, you may
> check your database using primary keys returned by SOLR before
> committing output to end users.


Mime
View raw message