lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 07:35:19 GMT
Can you use Update operation instead of Create? Then, you can supply
only the fields that need to be changed and use atomic update to
preserve the others. But then you will have issues when you _are_
creating new documents and you do need to store all fields.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com> wrote:
> Dears,
> Is there any way that I can do that in other way?
> I mean if you look at my main problem again you will find out that I have
> two types of fields in my documents. 1) The ones that should be overwritten
> on duplicates, 2) The ones that should not change during duplicates. So Is
> it another way to handle this situation from the first place? I mean using
> cross join for example?
> Assume I have a document with ID 2 which contains all the fields that can
> be overwritten. And another document with ID 2 which contains all fields
> that should not change during duplication detection. For selecting all
> fields it is enough to do join on ID and for Duplication it is enough to
> overwrite just document type 1.
> Regards.
>
>
> On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Well, it's implemented in SignatureUpdateProcessorFactory. Worst case,
>> you can clone that code and add your preserve-field functionality.
>> Could even be a nice contribution.
>>
>> Regards,
>>    Alex.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
>> wrote:
>> > Any suggestion would be appreciated.
>> > Regards.
>> >
>> >
>> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <alinazemian@gmail.com>
>> wrote:
>> >
>> >> Hi,
>> >> I used solr 4.8 for indexing the web pages that come from nutch. I know
>> >> that solr deduplication operation works on uniquekey field. So I set
>> that
>> >> to URL field. Everything is OK. except that I want after duplication
>> >> detection solr try not to delete all fields of old document. I want some
>> >> fields remain unchanged. For example assume I have a data field called
>> >> "read" with Boolean value "true" for specific document. I want all
>> fields
>> >> of new document overwrites except the value of this field. Is that
>> >> possible? How?
>> >> Regards.
>> >>
>> >> --
>> >> A.Nazemian
>> >>
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian

Mime
View raw message