hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2291) [hbase] Add row count estimator
Date Tue, 27 Nov 2007 18:42:43 GMT
[hbase] Add row count estimator
-------------------------------

                 Key: HADOOP-2291
                 URL: https://issues.apache.org/jira/browse/HADOOP-2291
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/hbase
            Reporter: stack
            Priority: Minor


Internally we have a little tool that will do a rough estimate of how many rows there are
in a dataHbase.  It keeps getting larger and larger partitions running scanners until it turns
up > N occupied rows.  Once it has a number > N, it multiples by the partition size
to get an approximate row count.  

This issue is about generalizing this feature so it could sit in the general hbase install.
 It would look something like:

{code}
long getApproximateRowCount(final Text startRow, final Text endRow, final long minimumCountPerPartition,
final long maximumPartitionSize)
{code}

Larger minimumCountPerPartition and maximumPartitionSize values would make the count more
accurate but would mean the method ran longer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message