lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
Date Tue, 11 May 2010 00:58:30 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-1782:
---------------------------

        Summary: stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued
fields  (was: unexpected statscomponent values)
    Description: the StatsComponent assumes any field specified in the stats.facet param can
be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems with a variety
of field types, but in the case of multivalued fields it can either cause erroneous false
stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException
when the number of distinct values is greater then the number of documents.  (was: I wanted
to understand the statscomponent better, so I setup a simple test index with a few thousand
docs.  In my schema I have: 
- an indexed multivalue sint field (StatsFacetField) that can contain values 0 thru 5 that
I want to use as my stats.facet field. 
- an indexed single value sint field (ValueOfOneField) that will always contain the value
1 and that I want stats on for this test 

When I execute the following query: 

http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField

For this situation (*:*) I was expecting that the statscomponent Count/Sum values for each
possible value in StatsFacetField to match the facet values for StatsFacetField.  They don't.
 Some are close (ie 204 vs 214) while others are way off (ie 230 vs 8000))

Updating issue summary and description based on the root cause

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1782
>                 URL: https://issues.apache.org/jira/browse/SOLR-1782
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.4
>         Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: CHANGES.txt 906924
2010-02-05 12:43:11Z noble $)
>            Reporter: Gerald DeConto
>         Attachments: index.rar
>
>
> the StatsComponent assumes any field specified in the stats.facet param can be faceted
using FieldCache.DEFAULT.getStringIndex.  This can cause problems with a variety of field
types, but in the case of multivalued fields it can either cause erroneous false stats when
the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when
the number of distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message