hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control
Date Fri, 18 Apr 2014 17:59:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974306#comment-13974306
] 

Jean-Daniel Cryans commented on HBASE-10932:
--------------------------------------------

Hey [~carp84], I forgot about this issue, let me address your latest replies.

bq. I thought it's designed for purpose to make each mapper just scan one single region

That's more an implementation detail than a design, and we can further improve the implementation
by giving more control to the power users.

bq. This is useful especially in multi-tenant env, when we need to check data integrity for
one user after data importing meanwhile don't want the scan burden to slow down RT of other
users' request.

Right, but again, resource management is a broader issue. I doubt that RowCounter is the only
job that needs to be throttled, what about VerifyReplication? Or Export? Those jobs usually
aren't latency sensitive and can run in the background. This can be simply handled by a correctly
configured job scheduler, that's what they do.

> Improve RowCounter to allow mapper number set/control
> -----------------------------------------------------
>
>                 Key: HBASE-10932
>                 URL: https://issues.apache.org/jira/browse/HBASE-10932
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>         Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity checking, like
after exporting some data from RDBMS to HBase, or from one HBase cluster to another, making
sure the row(record) number matches. Such check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per region, and each
mapper will send one scan request. Assuming the table is kind of big like having tens of regions,
and the cpu core number of the whole MR cluster is also enough, the parallel scan requests
sent by mapper would be a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional option "--maps"
to specify mapper number, and make each mapper able to scan more than one region of the target
table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message