lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: SolrCloud Result Grouping vs CollapsingQParserPlugin
Date Wed, 15 Jan 2014 12:25:37 GMT
"During query time, depending on the query, results can be returned from
both
shards. For e.g. a query
q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally
return data from both shards and apply the grouping on shard1 based on
adskdedup field. This will also ensure that group.ngroups=true will return
the right count."

This is correct and will work with standard grouping and the
CollapsingQParserPlugin.

"The other clarification I wanted was based on this statement : "When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key."
If we split shards, will Result Grouping / CollapsingQParserPlugin and
number of results still work ?"

With field collapsing you'll need to keep all group docs on the same shard,
so you won't be able to specify the number of bits.


"Last but not the least, when are you planning to release 4.6.1 ?"

There is a thread going on the dev list about the 4.6.1 release. You can
follow progress at:

http://markmail.org/search/?q=%22Lucene+%2F+Solr+4.6.1%22



Joel Bernstein
Search Engineer at Heliosearch


On Wed, Jan 15, 2014 at 2:35 AM, shamik <shamikb@gmail.com> wrote:

> Joel,
>
>   Thanks for the pointer. I went through your blog on Document routing,
> very
> informative. I do need some clarifications on the implementation. I'll try
> to run it based on my use case.
>
> I'm indexing documents from multiple source system out of which a bunch
> consist of duplicate content. I'm trying to remove them by applying result
> grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO
> and XYZ. Now, ABC and MNO source contains the duplicate documents, which is
> identified by a field say adskdedup. I've couple of shards, the id being
> the
> url of the documents. Now, to make field collapsing work, I need to update
> the id field to include "adskdedup!url" . Documents having identical
> adskdedup values should route to a dedicated shard , e.g. shard1. The ones
> which are not identical will be routed to either Shard1 or Shard2. After
> the
> indexing is done, shard1 should have all documents on which grouping needs
> to be applied upon.
>
> During query time, depending on the query, results can be returned from
> both
> shards. For e.g. a query
> q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally
> return data from both shards and apply the grouping on shard1 based on
> adskdedup field. This will also ensure that group.ngroups=true will return
> the right count.
>
> The other clarification I wanted was based on this statement : "When a
> tenant is too large to fit on a single shard it can be spread across
> multiple shards be specifying the number of bits to use from the shard
> key."
> If we split shards, will Result Grouping / CollapsingQParserPlugin and
> number of results still work ?
>
> Last but not the least, when are you planning to release 4.6.1 ?
>
> Again, appreciate your help on this.
>
> - Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message