lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-4499) StatsComponent could use some serious TLC
Date Mon, 25 Feb 2013 01:48:12 GMT
Robert Muir created SOLR-4499:
---------------------------------

             Summary: StatsComponent could use some serious TLC
                 Key: SOLR-4499
                 URL: https://issues.apache.org/jira/browse/SOLR-4499
             Project: Solr
          Issue Type: Bug
            Reporter: Robert Muir


Most of these problems are actually documented on the wiki page, but here is my go at ideas
for improving it, after reviewing this thing today.

# The external API should be made performant (e.g. some sort of paging for the stats.facet,
vs returning ALL values)
# The code for multi-valued fields is clearly broken: it tries to use a combination of UninvertedField
with a single-valued fieldcache for multivalued fields. 
# The behavior for multi-valued fields could be unexpected: whether its UninvertedField or
DocValues, these datastructures return the *unique* set of ordinals for the document. So I
think it can be very misleading to return stats like 'sum' for multivalued fields. 
# The stats returned should be implemented in ways that are fast. For example the string case
returns min/max, but does this by looking up every single ordinal to term and using string.compareTo.
the ords are themselves comparable, satisfying count/missing/min/max can all be done with
2 ord->term lookups per segment. These are also the only stats I think multi-valued numerics
should return (see above).
# Things like accumulate(NamedList) appear to have scary runtime (I think this one is only
used for merging distributed results?). They should not use the O\(n) get() method over and
over in accumulate() but instead do a single pass through the list.

Finally the code is pretty difficult to follow, and tests are inadequate for what all is going
on here.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message