hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-10932) Improve RowCounter to allow mapper number set/control
Date Tue, 08 Apr 2014 13:56:15 GMT
Yu Li created HBASE-10932:

             Summary: Improve RowCounter to allow mapper number set/control
                 Key: HBASE-10932
                 URL: https://issues.apache.org/jira/browse/HBASE-10932
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
            Reporter: Yu Li
            Assignee: Yu Li
            Priority: Minor

The typical use case of RowCounter is to do some kind of data integrity checking, like after
exporting some data from RDBMS to HBase, or from one HBase cluster to another, making sure
the row(record) number matches. Such check commonly won't require much on response time.
Meanwhile, based on current impl, RowCounter will launch one mapper per region, and each mapper
will send one scan request. Assuming the table is kind of big like having tens of regions,
and the cpu core number of the whole MR cluster is also enough, the parallel scan requests
sent by mapper would be a real burden for the HBase cluster.
So in this JIRA, we're proposing to make rowcounter support an additional option "--maps"
to specify mapper number, and make each mapper able to scan more than one region of the target

This message was sent by Atlassian JIRA

View raw message