lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.v.gronin...@gmail.com>
Subject Re: Grouping ngroups count
Date Thu, 03 May 2012 06:08:45 GMT
Hi Francois,

The issue you describe looks like a similar issue we have fixed before
with matches count.
Open an issue and we can look into it.

Martijn

On 1 May 2012 20:14, Francois Perron
<francois.perron@wantedanalytics.com> wrote:
> Thanks for your response Cody,
>
>  First, I used distributed grouping on 2 shards and I'm sure then all documents of each
group are in the same shard.
>
> I take a look on JIRA issue and it seem really similar.  There is the same problem with
group.ngroups.  The count is calculated in second pass so we only had result from "useful"
shards and it's why when I increase rows limit i got the right count (they must use all my
shards).
>
> Except it's a feature (i hope not), I will create a new JIRA issue for this.
>
> Thanks
>
> On 2012-05-01, at 12:32 PM, Young, Cody wrote:
>
>> Hello,
>>
>> When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query?
>>
>> If you're doing a distributed query, then for group.ngroups to work you need to ensure
that all documents for a group exist on a single shard.
>>
>> However, what you're describing sounds an awful lot like this JIRA issue that I entered
a while ago for distributed grouping. I found that the hit count was coming only from the
shards that ended up having results in the documents that were returned. I didn't test group.ngroups
at the time.
>>
>> https://issues.apache.org/jira/browse/SOLR-3316
>>
>> If this is a similar issue then you should make a new Jira issue.
>>
>> Cody
>>
>> -----Original Message-----
>> From: Francois Perron [mailto:francois.perron@wantedanalytics.com]
>> Sent: Tuesday, May 01, 2012 6:47 AM
>> To: solr-user@lucene.apache.org
>> Subject: Grouping ngroups count
>>
>> Hello all,
>>
>>  I tried to use grouping with 2 slices with a index of 35K documents.  When I ask
top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I ask for top 20K
rows, the ngroups property is now at 30K.
>>
>> Do you know why and of course how to fix it ?
>>
>> Thanks.
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message