lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Get distinct results in Solr
Date Tue, 01 Sep 2015 02:46:22 GMT
Thank you for your advice Alexandre.

Will try out the de-duplication from the link you gave.

Regards,
Edwin


On 1 September 2015 at 10:34, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> Re-read the question. You want to de-dupe on the full text-content.
>
> I would actually try to use the dedupe chain as per the link I gave
> but put results into a separate string field. Then, you group on that
> field. You cannot actually group on the long text field, that would
> kill any performance. So a signature is your proxy.
>
> Regards,
>    Alex
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> > Hi Alexandre,
> >
> > Will treating it as String affect the search or other functions like
> > highlighting?
> >
> > Yes, the content must be in my index, unless I do a copyField to do
> > de-duplication on that field.. Will that help?
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafalov@gmail.com>
> > wrote:
> >
> >> Can't you just treat it as String?
> >>
> >> Also, do you actually want those documents in your index in the first
> >> place? If not, have you looked at De-duplication:
> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> wrote:
> >> > Thanks Jan.
> >> >
> >> > But I read that the field that is being collapsed on must be a single
> >> > valued String, Int or Float. As I'm required to get the distinct
> results
> >> > from "content" field that was indexed from a rich text document, I got
> >> the
> >> > following error:
> >> >
> >> >   "error":{
> >> >     "msg":"java.io.IOException: 64 bit numeric collapse fields are not
> >> > supported",
> >> >     "trace":"java.lang.RuntimeException: java.io.IOException: 64 bit
> >> > numeric collapse fields are not supported\r\n\tat
> >> >
> >> >
> >> > Is it possible to collapsed on fields which has a long integer of
> data,
> >> > like content from a rich text document?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 31 August 2015 at 18:59, Jan Høydahl <jan.asf@cominvent.com>
> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> Check out the CollapsingQParser (
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> >> ).
> >> >> As long as you have a field that will be the same for all duplicates,
> >> you
> >> >> can “collapse” on that field. If you not have a “group id”,
you can
> >> create
> >> >> one using e.g. an MD5 signature of the identical body text (
> >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication).
> >> >>
> >> >> --
> >> >> Jan Høydahl, search solution architect
> >> >> Cominvent AS - www.cominvent.com
> >> >>
> >> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> >> >:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I'm using Solr 5.2.1, and I would like to find out, what is the
> best
> >> way
> >> >> to
> >> >> > get Solr to return only distinct results?
> >> >> >
> >> >> > Currently, I've indexed several exact similar documents into Solr,
> >> with
> >> >> > just different id and title, but the content is exactly the same.
> >> When I
> >> >> do
> >> >> > a search, Solr will return all these documents several time in
the
> >> list.
> >> >> >
> >> >> > What is the most suitable way to get Solr to return only one of
the
> >> >> > document during the search?
> >> >> > I understand that there is result grouping and faceting, but I'm
> not
> >> sure
> >> >> > if that is the best way.
> >> >> >
> >> >> > Regards,
> >> >> > Edwin
> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message