lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 09:35:19 GMT
Well, let us know when you figure out a way to satisfy all your requirements.

Solr is designed for a full-document replace to be efficient at it's
primary function (search). Any workaround require some sort of
sacrifice.

Good luck,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, Jul 7, 2014 at 4:32 PM, Ali Nazemian <alinazemian@gmail.com> wrote:
> Updating documents will add some extra time to indexing process. (I send
> the documents via apache Nutch) I prefer to make indexing as fast as
> possible.
>
>
> On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Can you use Update operation instead of Create? Then, you can supply
>> only the fields that need to be changed and use atomic update to
>> preserve the others. But then you will have issues when you _are_
>> creating new documents and you do need to store all fields.
>>
>> Regards,
>>    Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com>
>> wrote:
>> > Dears,
>> > Is there any way that I can do that in other way?
>> > I mean if you look at my main problem again you will find out that I have
>> > two types of fields in my documents. 1) The ones that should be
>> overwritten
>> > on duplicates, 2) The ones that should not change during duplicates. So
>> Is
>> > it another way to handle this situation from the first place? I mean
>> using
>> > cross join for example?
>> > Assume I have a document with ID 2 which contains all the fields that can
>> > be overwritten. And another document with ID 2 which contains all fields
>> > that should not change during duplication detection. For selecting all
>> > fields it is enough to do join on ID and for Duplication it is enough to
>> > overwrite just document type 1.
>> > Regards.
>> >
>> >
>> > On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> arafalov@gmail.com>
>> > wrote:
>> >
>> >> Well, it's implemented in SignatureUpdateProcessorFactory. Worst case,
>> >> you can clone that code and add your preserve-field functionality.
>> >> Could even be a nice contribution.
>> >>
>> >> Regards,
>> >>    Alex.
>> >>
>> >> Personal website: http://www.outerthoughts.com/
>> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> >> proficiency
>> >>
>> >>
>> >> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
>> >> wrote:
>> >> > Any suggestion would be appreciated.
>> >> > Regards.
>> >> >
>> >> >
>> >> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <alinazemian@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi,
>> >> >> I used solr 4.8 for indexing the web pages that come from nutch.
I
>> know
>> >> >> that solr deduplication operation works on uniquekey field. So
I set
>> >> that
>> >> >> to URL field. Everything is OK. except that I want after duplication
>> >> >> detection solr try not to delete all fields of old document. I
want
>> some
>> >> >> fields remain unchanged. For example assume I have a data field
>> called
>> >> >> "read" with Boolean value "true" for specific document. I want
all
>> >> fields
>> >> >> of new document overwrites except the value of this field. Is that
>> >> >> possible? How?
>> >> >> Regards.
>> >> >>
>> >> >> --
>> >> >> A.Nazemian
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > A.Nazemian
>> >>
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian

Mime
View raw message