lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Rochkind (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2242) Get distinct count of names for a facet field
Date Tue, 15 Mar 2011 03:46:29 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006782#comment-13006782
] 

Jonathan Rochkind commented on SOLR-2242:
-----------------------------------------

There is clearly a semantic problem here. i call that the number of 'facet values', what you
are calling a 'name' I am calilng a 'facet value'. I have no idea what you are calling a 'value',
honestly.  I'm pretty sure we're talking about the same thing. I have no idea what word to
use that will mean that to both of us and everyone else. 

I guess what you are calling 'number of values',if I understand properly,  I'd call 'sum of
the facet counts'.  facet counts are already called facet counts. Summing them up is the sum
of them. It's not a 'number of values'. (I also can't imagine any use case where you'd want
a sum of facet counts; for a single-valued field with no facet.missing, the sum of the facet
counts will equal the document count, numRows. In other cases it may not, and I have no idea
why you'd ever want it in those cases).   But the name is less important than the functionality,
I guess. (Except for that lack of establishment of consistent terminology in Solr is what
leads us to this confusion). Okay, wait, numFacetTerms, is that maybe clear, 'terms', since
Solr 'terms' is in fact what appear as the values/names in Solr facetting? From the wiki page
for facet.field: "It will iterate over each Term in the field and generate a facet count using
that Term as the constraint. "

But also perhaps I misunderstood, the functionality is of use/interest to me only if it does
NOT require me to set facet.limit=-1 to get this count of distinct values/names/terms.  If
I'm setting facet.limit=-1 anyway, that number is already implicit in the response, not much
value added making it explicit.  What I have need of is a way to get this number without setting
facet.limit=-1, since in my use cases I can have a million or more, um, values/names/terms.
(Which Solr 1.4.1 with facet.method=fc handles with aplomb!).  If your patch only works if
facet.limit=-1, it does not actually address my need. 


> Get distinct count of names for a facet field
> ---------------------------------------------
>
>                 Key: SOLR-2242
>                 URL: https://issues.apache.org/jira/browse/SOLR-2242
>             Project: Solr
>          Issue Type: New Feature
>          Components: Response Writers
>    Affects Versions: 4.0
>            Reporter: Bill Bell
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2242-distinctFacet.patch
>
>
> When returning facet.field=<name of field> you will get a list of matches for distinct
values. This is normal behavior. This patch tells you how many distinct values you have (#
of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - <lst name="facet_fields">
> - <lst name="hgid">
>   <int name="HGPY0000045FD36D4000A">1</int> 
>   <int name="HGPY00000FBC6690453A9">1</int> 
>   <int name="HGPY00001E44ED6C4FB3B">1</int> 
>   <int name="HGPY00001FA631034A1B8">1</int> 
>   <int name="HGPY00003317ABAC43B48">1</int> 
>   <int name="HGPY00003A17B2294CB5A">5</int> 
>   <int name="HGPY00003ADD2B3D48C39">1</int> 
>   </lst>
>   </lst>
> {code}
> With namedistinct (HGPY0000045FD36D4000A, HGPY00000FBC6690453A9, HGPY00001E44ED6C4FB3B,
HGPY00001FA631034A1B8, HGPY00003317ABAC43B48, HGPY00003A17B2294CB5A, HGPY00003ADD2B3D48C39).
This returns number of rows (7), not the number of values (11).
> {code}
> - <lst name="facet_fields">
> - <lst name="hgid">
>   <int name="_count_">7</int> 
>   </lst>
>   </lst>
> {code}
> This works actually really good to get total number of fields for a group.field=hgid.
Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message