lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David MARTIN <dmartin....@gmail.com>
Subject Re: Encountering a roadblock with my Solr schema design...use dedupe?
Date Sat, 16 Jan 2010 23:43:04 GMT
I'm really interested in reading the answer to this thread as my problem is
rather the same. Maybe my main difference is the huge SKU number per product
I may have.


David

On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor <wiredkel@hotmail.com> wrote:

>
> Hoss,
>
> Would you suggest using dedup for my use case; and if so, do you know of a
> working example I can reference?
>
> I don't have an issue using the patched version of Solr, but I'd much
> rather
> use the GA version.
>
> -Kelly
>
>
>
> hossman wrote:
> >
> >
> > : Dedupe is completely the wrong word. Deduping is something else
> > : entirely - it is about trying not to index the same document twice.
> >
> > Dedup can also certainly be used with field collapsing -- that was one of
> > the initial use cases identified for the SignatureUpdateProcessorFactory
> > ... you can compute an 'expensive' signature when adding a document,
> index
> > it, and then FieldCollapse on that signature field.
> >
> > This gives you "query time deduplication" based on a value computed when
> > indexing (the canonical example is multiple urls refrenceing the "same"
> > content but with slightly differnet boilerplate markup.  You can use a
> > Signature class that recognizes the boilerplate and computes an identical
> > signature value for each URL whose content is "the same" but still index
> > all of the URLs and their content as distinct documents ... so use cases
> > where people only "distinct" URLs work using field collapse but by
> default
> > all matching documents can still be returned and searches on text in the
> > boilerplate markup also still work.
> >
> >
> > -Hoss
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message