lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: solr dedup on specific fields
Date Mon, 07 Jul 2014 10:17:56 GMT
Yeah, unfortunately I want it to be searchable:(



On Mon, Jul 7, 2014 at 2:23 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> It's an interesting thought. I haven't tried those.
>
> But I don't think the EFFs are searchable. Do you need them to be
> searchable?
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jul 7, 2014 at 4:48 PM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> > Dear Alexande,
> > What if I use ExternalFileFiled for the fields that I dont want to be
> > changed? Does that work for me?
> > Regards.
> >
> >
> > On Mon, Jul 7, 2014 at 2:05 PM, Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> Well, let us know when you figure out a way to satisfy all your
> >> requirements.
> >>
> >> Solr is designed for a full-document replace to be efficient at it's
> >> primary function (search). Any workaround require some sort of
> >> sacrifice.
> >>
> >> Good luck,
> >>    Alex.
> >> Personal website: http://www.outerthoughts.com/
> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> proficiency
> >>
> >>
> >> On Mon, Jul 7, 2014 at 4:32 PM, Ali Nazemian <alinazemian@gmail.com>
> >> wrote:
> >> > Updating documents will add some extra time to indexing process. (I
> send
> >> > the documents via apache Nutch) I prefer to make indexing as fast as
> >> > possible.
> >> >
> >> >
> >> > On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch <
> >> arafalov@gmail.com>
> >> > wrote:
> >> >
> >> >> Can you use Update operation instead of Create? Then, you can supply
> >> >> only the fields that need to be changed and use atomic update to
> >> >> preserve the others. But then you will have issues when you _are_
> >> >> creating new documents and you do need to store all fields.
> >> >>
> >> >> Regards,
> >> >>    Alex.
> >> >> Personal website: http://www.outerthoughts.com/
> >> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> >> proficiency
> >> >>
> >> >>
> >> >> On Mon, Jul 7, 2014 at 2:08 PM, Ali Nazemian <alinazemian@gmail.com>
> >> >> wrote:
> >> >> > Dears,
> >> >> > Is there any way that I can do that in other way?
> >> >> > I mean if you look at my main problem again you will find out
that
> I
> >> have
> >> >> > two types of fields in my documents. 1) The ones that should be
> >> >> overwritten
> >> >> > on duplicates, 2) The ones that should not change during
> duplicates.
> >> So
> >> >> Is
> >> >> > it another way to handle this situation from the first place?
I
> mean
> >> >> using
> >> >> > cross join for example?
> >> >> > Assume I have a document with ID 2 which contains all the fields
> that
> >> can
> >> >> > be overwritten. And another document with ID 2 which contains
all
> >> fields
> >> >> > that should not change during duplication detection. For selecting
> all
> >> >> > fields it is enough to do join on ID and for Duplication it is
> enough
> >> to
> >> >> > overwrite just document type 1.
> >> >> > Regards.
> >> >> >
> >> >> >
> >> >> > On Tue, Jul 1, 2014 at 6:17 PM, Alexandre Rafalovitch <
> >> >> arafalov@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Well, it's implemented in SignatureUpdateProcessorFactory.
Worst
> >> case,
> >> >> >> you can clone that code and add your preserve-field functionality.
> >> >> >> Could even be a nice contribution.
> >> >> >>
> >> >> >> Regards,
> >> >> >>    Alex.
> >> >> >>
> >> >> >> Personal website: http://www.outerthoughts.com/
> >> >> >> Current project: http://www.solr-start.com/ - Accelerating
your
> Solr
> >> >> >> proficiency
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Jul 1, 2014 at 6:50 PM, Ali Nazemian <
> alinazemian@gmail.com>
> >> >> >> wrote:
> >> >> >> > Any suggestion would be appreciated.
> >> >> >> > Regards.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian <
> >> alinazemian@gmail.com>
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Hi,
> >> >> >> >> I used solr 4.8 for indexing the web pages that come
from
> nutch. I
> >> >> know
> >> >> >> >> that solr deduplication operation works on uniquekey
field. So
> I
> >> set
> >> >> >> that
> >> >> >> >> to URL field. Everything is OK. except that I want
after
> >> duplication
> >> >> >> >> detection solr try not to delete all fields of old
document. I
> >> want
> >> >> some
> >> >> >> >> fields remain unchanged. For example assume I have
a data field
> >> >> called
> >> >> >> >> "read" with Boolean value "true" for specific document.
I want
> all
> >> >> >> fields
> >> >> >> >> of new document overwrites except the value of this
field. Is
> that
> >> >> >> >> possible? How?
> >> >> >> >> Regards.
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> A.Nazemian
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > A.Nazemian
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > A.Nazemian
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > A.Nazemian
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message