lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2205) Grouping performance improvements
Date Thu, 28 Oct 2010 19:22:19 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925921#action_12925921
] 

Yonik Seeley commented on SOLR-2205:
------------------------------------

Oh, nice - I hadn't thought of checking if a doc is competitive before looking up it's group!

The ord part is related to SOLR-2068 - I had worked out a per-segment algorithm a few months
ago, but haven't had time to implement.  It looks like this one operates at the top-level
reader.  That can be better for static indexes (those that don't change much), but isn't as
good for NRT.  Also, it looks like this will double memory usage (FieldCache per-segment as
the native type, plus FieldCache at the top-level for the ords and string values).  Something
like that should be an option?

bq.  // I believe that the JVM internally will represent a boolean inside a boolean array
as a bit...

Not that I've heard.  It's internally represented as a byte[], so we would be better off using
an OpenBitSet - or even better, a sparse set since the number of elements will be very small
compared to the possible number of groups.

> Grouping performance improvements
> ---------------------------------
>
>                 Key: SOLR-2205
>                 URL: https://issues.apache.org/jira/browse/SOLR-2205
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2205.patch
>
>
> This issue is dedicated to the performance of the grouping functionality.
> I've noticed that the code is not really performing on large indexes. Doing a search
(q=*:*) with grouping on an index from around 5M documents took around one second on my local
development machine. We had to support grouping on an index that holds around 50M documents
per machine, so we made some changes and were able to happily serve that amount of documents.
Patch will follow soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message