lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 09:53:12 GMT
It's an interesting thought. I haven't tried those.

But I don't think the EFFs are searchable. Do you need them to be searchable?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, Jul 7, 2014 at 4:48 PM, Ali Nazemian <alinazemian@gmail.com> wrote:
> Dear Alexande,
> What if I use ExternalFileFiled for the fields that I dont want to be
> changed? Does that work for me?
> Regards.
>
>
> On Mon, Jul 7, 2014 at 2:05 PM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Well, let us know when you figure out a way to satisfy all your
>> requirements.
>>
>> Solr is designed for a full-document replace to be efficient at it's
>> primary function (search). Any workaround require some sort of
>> sacrifice.
>>
>> Good luck,
>>    Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Mon, Jul 7, 2014 at 4:32 PM, Ali Nazemian <alinazemian@gmail.com>
>> wrote:
>> > Updating documents will add some extra time to indexing process. (I send
>> > the documents via apache Nutch) I prefer to make indexing as fast as
>> > possible.
>> >
>> >
>> > On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch <
>> arafalov@gmail.com>
>> > wrote:
>> >
>> >> Can you use Update operation instead of Create? Then, you can supply
>> >> only the fields that need to be changed and use atomic update to
>> >> preserve the others. But then you will have issues when you _are_
>> >> creating new documents and you do need to store all fields.
>> >>
>> >> Regards,
>> >>    Alex.
>> >> Personal website: http://www.outerthoughts.com/
>> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> >> proficiency
>> >>
>> >>
>> >> On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com>
>> >> wrote:
>> >> > Dears,
>> >> > Is there any way that I can do that in other way?
>> >> > I mean if you look at my main problem again you will find out that
I
>> have
>> >> > two types of fields in my documents. 1) The ones that should be
>> >> overwritten
>> >> > on duplicates, 2) The ones that should not change during duplicates.
>> So
>> >> Is
>> >> > it another way to handle this situation from the first place? I mean
>> >> using
>> >> > cross join for example?
>> >> > Assume I have a document with ID 2 which contains all the fields that
>> can
>> >> > be overwritten. And another document with ID 2 which contains all
>> fields
>> >> > that should not change during duplication detection. For selecting
all
>> >> > fields it is enough to do join on ID and for Duplication it is enough
>> to
>> >> > overwrite just document type 1.
>> >> > Regards.
>> >> >
>> >> >
>> >> > On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> >> arafalov@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Well, it's implemented in SignatureUpdateProcessorFactory. Worst
>> case,
>> >> >> you can clone that code and add your preserve-field functionality.
>> >> >> Could even be a nice contribution.
>> >> >>
>> >> >> Regards,
>> >> >>    Alex.
>> >> >>
>> >> >> Personal website: http://www.outerthoughts.com/
>> >> >> Current project: http://www.solr-start.com/ - Accelerating your
Solr
>> >> >> proficiency
>> >> >>
>> >> >>
>> >> >> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
>> >> >> wrote:
>> >> >> > Any suggestion would be appreciated.
>> >> >> > Regards.
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <
>> alinazemian@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >> I used solr 4.8 for indexing the web pages that come from
nutch. I
>> >> know
>> >> >> >> that solr deduplication operation works on uniquekey field.
So I
>> set
>> >> >> that
>> >> >> >> to URL field. Everything is OK. except that I want after
>> duplication
>> >> >> >> detection solr try not to delete all fields of old document.
I
>> want
>> >> some
>> >> >> >> fields remain unchanged. For example assume I have a data
field
>> >> called
>> >> >> >> "read" with Boolean value "true" for specific document.
I want all
>> >> >> fields
>> >> >> >> of new document overwrites except the value of this field.
Is that
>> >> >> >> possible? How?
>> >> >> >> Regards.
>> >> >> >>
>> >> >> >> --
>> >> >> >> A.Nazemian
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > A.Nazemian
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > A.Nazemian
>> >>
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian

Mime
View raw message