Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 77517 invoked from network); 13 Sep 2010 21:44:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Sep 2010 21:44:30 -0000 Received: (qmail 62092 invoked by uid 500); 13 Sep 2010 21:44:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 61861 invoked by uid 500); 13 Sep 2010 21:44:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 61853 invoked by uid 99); 13 Sep 2010 21:44:28 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Sep 2010 21:44:28 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [205.188.91.97] (HELO imr-db03.mx.aol.com) (205.188.91.97) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Sep 2010 21:43:58 +0000 Received: from AOLDTCMEH01.ad.office.aol.com (aoldtcmeh01.office.aol.com [10.180.121.20]) by imr-db03.mx.aol.com (8.14.1/8.14.1) with ESMTP id o8DLhX9K021734 for ; Mon, 13 Sep 2010 17:43:33 -0400 Received: from EVSMTC01.ad.office.aol.com ([10.178.121.21]) by AOLDTCMEH01.ad.office.aol.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 13 Sep 2010 17:43:16 -0400 Received: from AOLDTCMEI31.ad.aol.aoltw.net ([10.180.121.109]) by EVSMTC01.ad.office.aol.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 13 Sep 2010 17:43:17 -0400 Received: from Justin-Cohens-MacBook-Pro.local (172.19.191.252) by AOLDTCMEI31.ad.aol.aoltw.net (10.180.121.109) with Microsoft SMTP Server (TLS) id 14.0.702.0; Mon, 13 Sep 2010 17:43:16 -0400 Message-ID: <4C8E9AF4.70506@teamaol.com> Date: Mon, 13 Sep 2010 17:43:16 -0400 From: Justin Cohen User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: Subject: Tuning simple count m/r job Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 13 Sep 2010 21:43:17.0420 (UTC) FILETIME=[AD0F9EC0:01CB538C] X-Virus-Checked: Checked by ClamAV on apache.org I have a table with 82 regions and about 44 million rows. It takes almost 6 minutes to count with map reduce. Is that a reasonable rate for a ten machine cluster of data nodes? That's just over 12,000 rows per second per machine�. Can I do better? Right now the only custom thing I am doing is setting scan.setCaching to 10,000. There's one gz column per row, but I just want to count rows, not decompress the columns... Is each map task assigned to each region? Some map tasks only have a few thousand rows. Others have over 2 million. Does this mean the regions aren't balanced, or does it also take into account size of columns with number of rows. Thanks, Justin