lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arcadius Ahouansou <arcad...@menelic.com>
Subject Re: Queries on De-Duplication
Date Fri, 04 Sep 2015 09:37:43 GMT
You could try using a hash of the content?
On Sep 4, 2015 9:00 AM, "Zheng Lin Edwin Yeo" <edwinyeozl@gmail.com> wrote:

> Hi,
>
> I'm trying out on the De-Duplication.I've tried to create a new signature
> field in schema.xml
> <field name="signature" type="string" stored="true" indexed="true"
> multiValued="false" />
>
> I've also added the following in solrconfig.xml.
>
>     <updateRequestProcessorChain name="dedupe">
>  <processor class="solr.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
> <str name="signatureField">signature</str>
> <bool name="overwriteDupes">false</bool>
> <str name="fields">content</str>
> <str name="signatureClass">solr.processor.Lookup3Signature</str>
>  </processor>
> <processor class="solr.DistributedUpdateProcessorFactory" />
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
>
> However, I can't do a copyField of content into this signature field as
> some of my contents are more than 32766 characters in length. Previously, I
> tried to point the signatureField directly to content. but that is not
> working too.
>
> Anything else that I can do to do a group on a new signatureField?
>
>
> Regards,
> Edwin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message