lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 07:08:20 GMT
Dears,
Is there any way that I can do that in other way?
I mean if you look at my main problem again you will find out that I have
two types of fields in my documents. 1) The ones that should be overwritten
on duplicates, 2) The ones that should not change during duplicates. So Is
it another way to handle this situation from the first place? I mean using
cross join for example?
Assume I have a document with ID 2 which contains all the fields that can
be overwritten. And another document with ID 2 which contains all fields
that should not change during duplication detection. For selecting all
fields it is enough to do join on ID and for Duplication it is enough to
overwrite just document type 1.
Regards.


On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> Well, it's implemented in SignatureUpdateProcessorFactory. Worst case,
> you can clone that code and add your preserve-field functionality.
> Could even be a nice contribution.
>
> Regards,
>    Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> > Any suggestion would be appreciated.
> > Regards.
> >
> >
> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> >
> >> Hi,
> >> I used solr 4.8 for indexing the web pages that come from nutch. I know
> >> that solr deduplication operation works on uniquekey field. So I set
> that
> >> to URL field. Everything is OK. except that I want after duplication
> >> detection solr try not to delete all fields of old document. I want some
> >> fields remain unchanged. For example assume I have a data field called
> >> "read" with Boolean value "true" for specific document. I want all
> fields
> >> of new document overwrites except the value of this field. Is that
> >> possible? How?
> >> Regards.
> >>
> >> --
> >> A.Nazemian
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message