lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
Subject RE: Only indexing changed documents
Date Fri, 07 Aug 2015 15:30:58 GMT
Thanks - key is that signature field will not be id, and overwriteDupes will be false:

      <bool name="overwriteDupes">false</bool>
      <str name="signatureField">sig</str>

-----Original Message-----
From: Upayavira [mailto:uv@odoko.co.uk] 
Sent: Friday, August 07, 2015 11:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Only indexing changed documents

Use the DedupUpdateProcessor, which can compute a signature based upon the specified fields.

Upayavira

On Fri, Aug 7, 2015, at 03:56 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> I have an application that knows enough to tell me that a document has
> been updated, but not which document has been updated.    There aren't
> that many documents in this core/collection - just a couple of 1000.   So
> far I've just been pumping them all to the update handler every week, 
> but the business folk really want the database and the index to be
> synchronized when the back-end staff make an update.    As is typical in
> indexing, updates are more frequent than searchers (or at least are 
> expected to be once things pick-up - we may even reach a whopping 10k 
> documents at some point :))
> 
> Each document has an id I wish to use as the unique ID, but I also want
> to compute a signature.   Is there some way I can use an
> updateRequestProcessorChain to throw away a document if its signature 
> and document id match based on real-time get?
> 
> My apologies if this is a duplicate of a prior question - solr-user is 
> faily high traffic.
> 
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
> 

Mime
View raw message