lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wojtek Piaseczny (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
Date Fri, 11 Jun 2010 17:26:15 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877865#action_12877865
] 

Wojtek Piaseczny commented on SOLR-1782:
----------------------------------------

I'd like to contribute to solving this issue, but I'm not sure if I'm going down the right
path. Here are the possible solutions I see:

1. Use UninvertedField for multi-valued facets in the StatsComponent. This would require a
new method in UninvertedField: something like getValues(int docID). The problem with this
is the big terms collection in UninvertedField... getting all values for a single document
via big terms is expensive (have to iterate entire collection). 
2. Get facet values for the result set in the StatsComponent, then iterate through each value
and get a new document set for each value, then iterate through each document in this new
set and calculate stats. Sounds expensive.

Are there better options? 

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1782
>                 URL: https://issues.apache.org/jira/browse/SOLR-1782
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.4
>         Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: CHANGES.txt 906924
2010-02-05 12:43:11Z noble $)
>            Reporter: Gerald DeConto
>         Attachments: index.rar, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can be faceted
using FieldCache.DEFAULT.getStringIndex.  This can cause problems with a variety of field
types, but in the case of multivalued fields it can either cause erroneous false stats when
the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when
the number of distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message