lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: mark solr documents as duplicates on hashing the combination of some fields
Date Wed, 22 Oct 2014 18:17:37 GMT

: I meant signature will be broken. For example suppose the destination of
: hash function for signature fields are "sig". After each partial update it
: becomes: "0000000000"!

details please.

how are you configuring your update processor chain? what does your schema 
look like? what types of atomic updates are you using?

in general atomic updates require that all source fields be stored - so 
you might be having problems if the fields you are trying to hash aren't 
stored.

likewise, the atomic updates are processed as part of the 
DistributedUpdateProcessor (so they execute on the leader and work with 
optimistic concurrency) but that means if you have the 
SignatureUpdateProcessorFactory configured before the 
DistributedUpdateProcessorFactory it could compute a signature based on 
the raw doc you send (with the updatecommands) instead of the "real" doc 
with the updates applied.

for a situation where you want the signatureField to *be* the uniqueKey, 
then you kind of have to put SignatureUpdateProcessorFactory before 
DistributedUpdateProcessorFactory -- but for a situation like yours, you 
need to ensure that SignatureUpdateProcessorFactory comes *after* 
DistributedUpdateProcessorFactory and before the 
RunUpdateProcessorFactory.


-Hoss
http://www.lucidworks.com/

Mime
View raw message