Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76CA91154D for ; Wed, 9 Apr 2014 17:34:44 +0000 (UTC) Received: (qmail 73946 invoked by uid 500); 9 Apr 2014 17:34:37 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 73778 invoked by uid 500); 9 Apr 2014 17:34:36 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 73593 invoked by uid 99); 9 Apr 2014 17:34:28 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Apr 2014 17:34:28 +0000 Date: Wed, 9 Apr 2014 17:34:28 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964433#comment-13964433 ] Jean-Daniel Cryans commented on HBASE-10932: -------------------------------------------- Thinking about this more: bq. If I am a user, maybe I would set to 2 or other lower value here It sounds like, in order to solve your use case without setting up a scheduler, you could simply use the "count" command in the shell, since the only thing below 2 scans is 1 scan :) > Improve RowCounter to allow mapper number set/control > ----------------------------------------------------- > > Key: HBASE-10932 > URL: https://issues.apache.org/jira/browse/HBASE-10932 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Yu Li > Assignee: Yu Li > Priority: Minor > Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch > > > The typical use case of RowCounter is to do some kind of data integrity checking, like after exporting some data from RDBMS to HBase, or from one HBase cluster to another, making sure the row(record) number matches. Such check commonly won't require much on response time. > Meanwhile, based on current impl, RowCounter will launch one mapper per region, and each mapper will send one scan request. Assuming the table is kind of big like having tens of regions, and the cpu core number of the whole MR cluster is also enough, the parallel scan requests sent by mapper would be a real burden for the HBase cluster. > So in this JIRA, we're proposing to make rowcounter support an additional option "--maps" to specify mapper number, and make each mapper able to scan more than one region of the target table. -- This message was sent by Atlassian JIRA (v6.2#6252)