lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: SOLR reindexing
Date Sat, 02 Mar 2013 00:22:18 GMT

: For full reindexes (DIH full-import), I use build cores, then swap them with
: the live cores.  I don't do this for performance reasons, I do it because I
: want to continue making incremental updates to the live cores while the
: rebuild is underway.  The rebuild takes four hours.

that's kind of a special case though -- the OP mentioned that the entire 
reason he does full rebuilds is because he has no way of inrementally 
tracking changes to his source data, so he's clearly not going to be 
making incremental updates to a "live" core.

in the simpler case of "rebuild the full index ever N hours, never to 
incremental updates" a simple master/slave setup is probably the easiest 
-- with a single "snappull" command being triggered once you know the full 
index is build.

If you only have one machine to work with, then another simple appropach 
is just use a single solr core, and rebuild ontop of your existing data 
every N hours.  You can use a "timestamp" field to keep track of when 
documents were added and do a deleteByQuery on the timestamp field at the 
end of teh "rebuild" to remove any old documents (ie: things no longer in 
your source data)

as long as you don't commit until the end of your "rebuild" you don't need 
to worry about inconsistent data, and you should wind up using less 
resources then the core swapping approach.

-Hoss

Mime
View raw message