hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9543) Impl unique aggregation
Date Wed, 25 Sep 2013 17:28:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777779#comment-13777779

Jean-Marc Spaggiari commented on HBASE-9543:

Now, last comment...

What's about scalability? If the unique is done on a column where ALL the values are different
(Might be called my mistake on a UID). This is going to load ALL the values into memory for
ALL the regions. If you have a 10GB region, and 70% of it is the value, that mean you are
going to create on 7GB set into colSet. Multiply that by the number of regions and you are
in trouble. Should there be a property to limit this? You can't really send intermediate results
because you need to keep them for the comparison. So should there be something like aggregate.uniq.maximum.values=10000
which will limit the size of the set to that number of entries and will throw an exception
if we go over?
>  Impl unique aggregation
> ------------------------
>                 Key: HBASE-9543
>                 URL: https://issues.apache.org/jira/browse/HBASE-9543
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Minor
>         Attachments: HBASE-9543-0.94-v1.diff, HBASE-9543-trunk-v1.diff, HBASE-9543-trunk-v2.diff
> Impl unique aggregation: return a set of all columns' values in a scan.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message