lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1782) unexpected statscomponent values
Date Mon, 10 May 2010 23:12:30 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865973#action_12865973
] 

Hoss Man commented on SOLR-1782:
--------------------------------

I'm pokign around the attached RAR file now, and two interesting things jump out at me:

First: StatsFacetField is multivalied.  every doc has exactly 3 values, but on some cases
the value is repeated...

{noformat}
<doc>
 <arr name="StatsFacetField">
  <int>2</int>
  <int>3</int>
  <int>1</int>
 </arr>
 <int name="ValueOfOneField">1</int>
 <int name="id">7631</int>
</doc>
<doc>
 <arr name="StatsFacetField">
  <int>3</int>
  <int>3</int>
  <int>1</int>
 </arr>
 <int name="ValueOfOneField">1</int>
 <int name="id">7453</int>
</doc>
{noformat}

Second: the stat facet counts produced by this sample index have the (conicidental?) property
that the sum of all the "counts" from each value of the StatsFacetField equals the total number
of docs -- which should not be the case since each doc contains multiple values.  (Note: the
output from Gerald's initial email didn't demonstrate this, but the index included in the
rar file is inconsistent with his initial email in other ways, so i believe this one was generated
with slightly differnet configs)

I think this suggests that the the bug in StatsComponent is triggered when the stats.facet
field refers to a multivalued field.  I'm going to see if i can create a simple JUnit test.



> unexpected statscomponent values
> --------------------------------
>
>                 Key: SOLR-1782
>                 URL: https://issues.apache.org/jira/browse/SOLR-1782
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.4
>         Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: CHANGES.txt 906924
2010-02-05 12:43:11Z noble $)
>            Reporter: Gerald DeConto
>         Attachments: index.rar
>
>
> I wanted to understand the statscomponent better, so I setup a simple test index with
a few thousand docs.  In my schema I have: 
> - an indexed multivalue sint field (StatsFacetField) that can contain values 0 thru 5
that I want to use as my stats.facet field. 
> - an indexed single value sint field (ValueOfOneField) that will always contain the value
1 and that I want stats on for this test 
> When I execute the following query: 
> http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField
> For this situation (*:*) I was expecting that the statscomponent Count/Sum values for
each possible value in StatsFacetField to match the facet values for StatsFacetField.  They
don't.  Some are close (ie 204 vs 214) while others are way off (ie 230 vs 8000)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message