lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 09:32:04 GMT
Updating documents will add some extra time to indexing process. (I send
the documents via apache Nutch) I prefer to make indexing as fast as
possible.


On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> Can you use Update operation instead of Create? Then, you can supply
> only the fields that need to be changed and use atomic update to
> preserve the others. But then you will have issues when you _are_
> creating new documents and you do need to store all fields.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> > Dears,
> > Is there any way that I can do that in other way?
> > I mean if you look at my main problem again you will find out that I have
> > two types of fields in my documents. 1) The ones that should be
> overwritten
> > on duplicates, 2) The ones that should not change during duplicates. So
> Is
> > it another way to handle this situation from the first place? I mean
> using
> > cross join for example?
> > Assume I have a document with ID 2 which contains all the fields that can
> > be overwritten. And another document with ID 2 which contains all fields
> > that should not change during duplication detection. For selecting all
> > fields it is enough to do join on ID and for Duplication it is enough to
> > overwrite just document type 1.
> > Regards.
> >
> >
> > On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> Well, it's implemented in SignatureUpdateProcessorFactory. Worst case,
> >> you can clone that code and add your preserve-field functionality.
> >> Could even be a nice contribution.
> >>
> >> Regards,
> >>    Alex.
> >>
> >> Personal website: http://www.outerthoughts.com/
> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> proficiency
> >>
> >>
> >> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <alinazemian@gmail.com>
> >> wrote:
> >> > Any suggestion would be appreciated.
> >> > Regards.
> >> >
> >> >
> >> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <alinazemian@gmail.com>
> >> wrote:
> >> >
> >> >> Hi,
> >> >> I used solr 4.8 for indexing the web pages that come from nutch. I
> know
> >> >> that solr deduplication operation works on uniquekey field. So I set
> >> that
> >> >> to URL field. Everything is OK. except that I want after duplication
> >> >> detection solr try not to delete all fields of old document. I want
> some
> >> >> fields remain unchanged. For example assume I have a data field
> called
> >> >> "read" with Boolean value "true" for specific document. I want all
> >> fields
> >> >> of new document overwrites except the value of this field. Is that
> >> >> possible? How?
> >> >> Regards.
> >> >>
> >> >> --
> >> >> A.Nazemian
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > A.Nazemian
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message