lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Queries on De-Duplication
Date Fri, 04 Sep 2015 15:34:25 GMT
How do we do a hashing of the content?

Regards,
Edwin

On 4 September 2015 at 17:37, Arcadius Ahouansou <arcadius@menelic.com>
wrote:

> You could try using a hash of the content?
> On Sep 4, 2015 9:00 AM, "Zheng Lin Edwin Yeo" <edwinyeozl@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm trying out on the De-Duplication.I've tried to create a new signature
> > field in schema.xml
> > <field name="signature" type="string" stored="true" indexed="true"
> > multiValued="false" />
> >
> > I've also added the following in solrconfig.xml.
> >
> >     <updateRequestProcessorChain name="dedupe">
> >  <processor class="solr.processor.SignatureUpdateProcessorFactory">
> > <bool name="enabled">true</bool>
> > <str name="signatureField">signature</str>
> > <bool name="overwriteDupes">false</bool>
> > <str name="fields">content</str>
> > <str name="signatureClass">solr.processor.Lookup3Signature</str>
> >  </processor>
> > <processor class="solr.DistributedUpdateProcessorFactory" />
> > <processor class="solr.LogUpdateProcessorFactory" />
> > <processor class="solr.RunUpdateProcessorFactory" />
> > </updateRequestProcessorChain>
> >
> >
> > However, I can't do a copyField of content into this signature field as
> > some of my contents are more than 32766 characters in length.
> Previously, I
> > tried to point the signatureField directly to content. but that is not
> > working too.
> >
> > Anything else that I can do to do a group on a new signatureField?
> >
> >
> > Regards,
> > Edwin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message