hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-32) [hbase] Add row count estimator
Date Mon, 04 May 2009 18:24:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705672#action_12705672
] 

stack commented on HBASE-32:
----------------------------

I like this.  MR job would take optionally table name so could per table rather than all HBase.
 Split would be on a line in .META.  Map would read all store files emitting stats keyed with
a table + column family prefix?  Reduce would sum per table column family?   Postprocess could
sum on table basis?

Could add it to our hbase MR Driver so we had more than just RowCounter.

> [hbase] Add row count estimator
> -------------------------------
>
>                 Key: HBASE-32
>                 URL: https://issues.apache.org/jira/browse/HBASE-32
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>            Reporter: stack
>            Priority: Minor
>         Attachments: 2291_v01.patch, Keying.java
>
>
> Internally we have a little tool that will do a rough estimate of how many rows there
are in a dataHbase.  It keeps getting larger and larger partitions running scanners until
it turns up > N occupied rows.  Once it has a number > N, it multiples by the partition
size to get an approximate row count.  
> This issue is about generalizing this feature so it could sit in the general hbase install.
 It would look something like:
> {code}
> long getApproximateRowCount(final Text startRow, final Text endRow, final long minimumCountPerPartition,
final long maximumPartitionSize)
> {code}
> Larger minimumCountPerPartition and maximumPartitionSize values would make the count
more accurate but would mean the method ran longer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message