hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "haosdent (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control
Date Wed, 09 Apr 2014 16:54:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964370#comment-13964370
] 

haosdent commented on HBASE-10932:
----------------------------------

{quote}
the configuration parameter is still called "maps".
{quote}
"scanner.num" maybe better.  

{quote}
Let's say you use this new "maps" configuration and set to 20.
{quote}
If I am a user, maybe I would set to 2 or other lower value here.

Anyway, I think this issue is an useful issue. Because of have some import online businesses
in my clusters, any unnecessary heavy IO could unacceptable. [~jdcryans] focus on code style
while [~carp84] focus on how to handle this scenario and make the number of mappers configurable.
Maybe we need a consensus about choose which way to workaround this issue here. Just my opinions.

> Improve RowCounter to allow mapper number set/control
> -----------------------------------------------------
>
>                 Key: HBASE-10932
>                 URL: https://issues.apache.org/jira/browse/HBASE-10932
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>         Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity checking, like
after exporting some data from RDBMS to HBase, or from one HBase cluster to another, making
sure the row(record) number matches. Such check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per region, and each
mapper will send one scan request. Assuming the table is kind of big like having tens of regions,
and the cpu core number of the whole MR cluster is also enough, the parallel scan requests
sent by mapper would be a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional option "--maps"
to specify mapper number, and make each mapper able to scan more than one region of the target
table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message