lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 09:48:52 GMT
Dear Alexande,
What if I use ExternalFileFiled for the fields that I dont want to be
changed? Does that work for me?
Regards.


On Mon, Jul 7, 2014 at 2:05 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> Well, let us know when you figure out a way to satisfy all your
> requirements.
>
> Solr is designed for a full-document replace to be efficient at it's
> primary function (search). Any workaround require some sort of
> sacrifice.
>
> Good luck,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jul 7, 2014 at 4:32 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> > Updating documents will add some extra time to indexing process. (I send
> > the documents via apache Nutch) I prefer to make indexing as fast as
> > possible.
> >
> >
> > On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> Can you use Update operation instead of Create? Then, you can supply
> >> only the fields that need to be changed and use atomic update to
> >> preserve the others. But then you will have issues when you _are_
> >> creating new documents and you do need to store all fields.
> >>
> >> Regards,
> >>    Alex.
> >> Personal website: http://www.outerthoughts.com/
> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> proficiency
> >>
> >>
> >> On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com>
> >> wrote:
> >> > Dears,
> >> > Is there any way that I can do that in other way?
> >> > I mean if you look at my main problem again you will find out that I
> have
> >> > two types of fields in my documents. 1) The ones that should be
> >> overwritten
> >> > on duplicates, 2) The ones that should not change during duplicates.
> So
> >> Is
> >> > it another way to handle this situation from the first place? I mean
> >> using
> >> > cross join for example?
> >> > Assume I have a document with ID 2 which contains all the fields that
> can
> >> > be overwritten. And another document with ID 2 which contains all
> fields
> >> > that should not change during duplication detection. For selecting all
> >> > fields it is enough to do join on ID and for Duplication it is enough
> to
> >> > overwrite just document type 1.
> >> > Regards.
> >> >
> >> >
> >> > On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <
> >> arafalov@gmail.com>
> >> > wrote:
> >> >
> >> >> Well, it's implemented in SignatureUpdateProcessorFactory. Worst
> case,
> >> >> you can clone that code and add your preserve-field functionality.
> >> >> Could even be a nice contribution.
> >> >>
> >> >> Regards,
> >> >>    Alex.
> >> >>
> >> >> Personal website: http://www.outerthoughts.com/
> >> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> >> proficiency
> >> >>
> >> >>
> >> >> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
> >> >> wrote:
> >> >> > Any suggestion would be appreciated.
> >> >> > Regards.
> >> >> >
> >> >> >
> >> >> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <
> alinazemian@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >> I used solr 4.8 for indexing the web pages that come from
nutch. I
> >> know
> >> >> >> that solr deduplication operation works on uniquekey field.
So I
> set
> >> >> that
> >> >> >> to URL field. Everything is OK. except that I want after
> duplication
> >> >> >> detection solr try not to delete all fields of old document.
I
> want
> >> some
> >> >> >> fields remain unchanged. For example assume I have a data
field
> >> called
> >> >> >> "read" with Boolean value "true" for specific document. I
want all
> >> >> fields
> >> >> >> of new document overwrites except the value of this field.
Is that
> >> >> >> possible? How?
> >> >> >> Regards.
> >> >> >>
> >> >> >> --
> >> >> >> A.Nazemian
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > A.Nazemian
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > A.Nazemian
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message