lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats
Date Tue, 24 Jul 2012 22:15:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421824#comment-13421824
] 

Hoss Man commented on SOLR-3642:
--------------------------------

Yangdong: the issue i linked this one to (SOLR-1782) is open precisely to try and address
this problem -- there is an (old) patch there that i honestly have not had time to look at,
but you may want to take a look and see if it can be brought up to date and polished up to
work and have good tests

(IIRC: the reason i never really dug into it before was because the way StatsComponent deals
with stats.facet in general struck me as being kind of kludgy and hard to understand, and
i couldn't see a clean way to make it work well with both multivalued fields and arbitrary
field types)
 
                
> Count is inconsistent between facet and stats
> ---------------------------------------------
>
>                 Key: SOLR-3642
>                 URL: https://issues.apache.org/jira/browse/SOLR-3642
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.0-ALPHA
>         Environment: 4.0 alpha on macos 10.6
>            Reporter: Yandong Yao
>            Assignee: Hoss Man
>             Fix For: 4.0, 5.0
>
>         Attachments: SOLR-3642.patch
>
>
> Steps to reproduce:
> 1) Download apache-solr-4.0.0-ALPHA
> 2) cd example;  java -jar start.jar
> 3) cd exampledocs;  ./post.sh *.xml
> 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'.
 And the 'count' for 'electronics' is 3
> http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat
> {
> stats_fields: 
> {
> popularity: 
> {
> min: 0,
> max: 10,
> count: 14,
> missing: 0,
> sum: 75,
> sumOfSquares: 503,
> mean: 5.357142857142857,
> stddev: 2.7902892835178013,
> facets: 
> {
> cat: 
> {
> music: 
> {
> min: 10,
> max: 10,
> count: 1,
> missing: 0,
> sum: 10,
> sumOfSquares: 100,
> mean: 10,
> stddev: 0
> },
> monitor: 
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> hard drive: 
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> scanner: 
> {
> min: 6,
> max: 6,
> count: 1,
> missing: 0,
> sum: 6,
> sumOfSquares: 36,
> mean: 6,
> stddev: 0
> },
> memory: 
> {
> min: 0,
> max: 7,
> count: 3,
> missing: 0,
> sum: 12,
> sumOfSquares: 74,
> mean: 4,
> stddev: 3.605551275463989
> },
> graphics card: 
> {
> min: 7,
> max: 7,
> count: 2,
> missing: 0,
> sum: 14,
> sumOfSquares: 98,
> mean: 7,
> stddev: 0
> },
> electronics: 
> {
> min: 1,
> max: 7,
> count: 3,
> missing: 0,
> sum: 9,
> sumOfSquares: 51,
> mean: 3,
> stddev: 3.4641016151377544
> }
> }
> }
> }
> }
> }
> 5)  Facet on 'cat' and the count is 14.  http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat
> {
> cat: 
> [
> "electronics",
> 14,
> "memory",
> 3,
> "connector",
> 2,
> "graphics card",
> 2,
> "hard drive",
> 2,
> "monitor",
> 2,
> "camera",
> 1,
> "copier",
> 1,
> "multifunction printer",
> 1,
> "music",
> 1,
> "printer",
> 1,
> "scanner",
> 1,
> "currency",
> 0,
> "search",
> 0,
> "software",
> 0
> ]
> },
> So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report
14 'electronics'. Is this a bug?
> Following is the field definition for 'cat'. 
> <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message