Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BB8A9DCA for ; Fri, 23 Sep 2011 21:19:50 +0000 (UTC) Received: (qmail 63285 invoked by uid 500); 23 Sep 2011 21:19:50 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 63256 invoked by uid 500); 23 Sep 2011 21:19:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 63177 invoked by uid 99); 23 Sep 2011 21:19:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Sep 2011 21:19:50 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Sep 2011 21:19:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8530DAE570 for ; Fri, 23 Sep 2011 21:19:26 +0000 (UTC) Date: Fri, 23 Sep 2011 21:19:26 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: <426183753.8146.1316812766542.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1764468723.6114.1314680677701.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4295) rowcounter does not return the correct number of rows in certain circumstances MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113776#comment-13113776 ] Hudson commented on HBASE-4295: ------------------------------- Integrated in HBase-TRUNK #2246 (See [https://builds.apache.org/job/HBase-TRUNK/2246/]) HBASE-4295 rowcounter does not return the correct number of rows in certain circumstances stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/RowCounter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java > rowcounter does not return the correct number of rows in certain circumstances > ------------------------------------------------------------------------------ > > Key: HBASE-4295 > URL: https://issues.apache.org/jira/browse/HBASE-4295 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 0.90.4 > Reporter: Wing Yew Poon > Assignee: Dave Revell > Fix For: 0.90.5 > > Attachments: HBASE-4295-v1.patch > > > When you run > {noformat} > hadoop jar hbase.jar rowcounter > {noformat} > the org.apache.hadoop.hbase.mapreduce.RowCounter class is run. > The RowCounterMapper class in the RowCounter mapreduce job contains the following: > {noformat} > @Override > public void map(ImmutableBytesWritable row, Result values, > Context context) > throws IOException { > for (KeyValue value: values.list()) { > if (value.getValue().length > 0) { > context.getCounter(Counters.ROWS).increment(1); > break; > } > } > } > {noformat} > The intention is to go through the column values in the row, and increment the ROWS counter if some value is non-empty. However, values.list() always has size 1. This is because the createSubmittableJob static method uses a Scan as follows: > {noformat} > Scan scan = new Scan(); > scan.setFilter(new FirstKeyOnlyFilter()); > {noformat} > So the input map splits always contain just the first KV. If the column corresponding to that first KV is empty, even though other columns are non-empty, that row is skipped. > This way, rowcounter can return an incorrect result. > One way to reproduce this is to create an hbase table with two columns, say f1:q1 and f2:q2. Create some (say 2) rows with empty f1:q1 but non-empty f2:q2, and some (say 3) rows with empty f2:q2 and non-empty f1:q1. > Then run rowcounter (specifying only the table but not any columns). The count will be either 2 short or 3 short. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira