lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lebin Sebastian <le...@codetheory.io>
Subject SOLR | De-Duplication | Remove duplicate records based on their status
Date Wed, 31 May 2017 16:06:32 GMT
Hello,

I am indexing two different model with same data but different status.

Eg:
*Scenario -1*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "A"}

Expected Output
{Model: "BBBB", name: "abc", status: "A"}

*Scenario -2 *
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "T"}

Expected Output
{Model: "AAAA", name: "abc", status: "A"}

*Scenario -3*
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "A"}

Expected Output
{Model: "AAAA", name: "abc", status: "A"} either one.


*Scenario -4*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "T"}

Expected Output
{Model: "AAAA", name: "abc", status: "T"} either one.

.

Scenario 3 & 4 are working as expected with current configuration which I
have given below.

For Scenario 1 & 2 output should be based on the status of the record.

Please help me to fix scenario 1 & 2.


*Solr version : 5.3*

*Solrconfig.xml*

<requestHandler name="/update" class="solr.UpdateRequestHandler" >
  <lst name="defaults">
    <str name="update.chain">dedupe</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="dedupe">
  <processor class="solr.processor.SignatureUpdateProcessorFactory">
    <bool name="enabled">true</bool>
    <str name="signatureField">signature</str>
    <bool name="overwriteDupes">true</bool>
    <str name="fields">id</str>
    <str name="signatureClass">solr.processor.Lookup3Signature</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>


Thanks,

Lebin F

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message