lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francois Perron <francois.per...@wantedanalytics.com>
Subject Re: Grouping ngroups count
Date Tue, 01 May 2012 18:14:54 GMT
Thanks for your response Cody,

  First, I used distributed grouping on 2 shards and I'm sure then all documents of each group
are in the same shard.  

I take a look on JIRA issue and it seem really similar.  There is the same problem with group.ngroups.
 The count is calculated in second pass so we only had result from "useful" shards and it's
why when I increase rows limit i got the right count (they must use all my shards).

Except it's a feature (i hope not), I will create a new JIRA issue for this.

Thanks

On 2012-05-01, at 12:32 PM, Young, Cody wrote:

> Hello,
> 
> When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query?
> 
> If you're doing a distributed query, then for group.ngroups to work you need to ensure
that all documents for a group exist on a single shard.
> 
> However, what you're describing sounds an awful lot like this JIRA issue that I entered
a while ago for distributed grouping. I found that the hit count was coming only from the
shards that ended up having results in the documents that were returned. I didn't test group.ngroups
at the time.
> 
> https://issues.apache.org/jira/browse/SOLR-3316
> 
> If this is a similar issue then you should make a new Jira issue.
> 
> Cody
> 
> -----Original Message-----
> From: Francois Perron [mailto:francois.perron@wantedanalytics.com] 
> Sent: Tuesday, May 01, 2012 6:47 AM
> To: solr-user@lucene.apache.org
> Subject: Grouping ngroups count
> 
> Hello all,
> 
>  I tried to use grouping with 2 slices with a index of 35K documents.  When I ask top
10 rows, grouped by filed A, it gave me about 16K groups.  But, if I ask for top 20K rows,
the ngroups property is now at 30K.  
> 
> Do you know why and of course how to fix it ?
> 
> Thanks.


Mime
View raw message