Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE1D410F51 for ; Mon, 26 Aug 2013 10:49:23 +0000 (UTC) Received: (qmail 69265 invoked by uid 500); 26 Aug 2013 10:49:21 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 68873 invoked by uid 500); 26 Aug 2013 10:49:20 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 68862 invoked by uid 99); 26 Aug 2013 10:49:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 10:49:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FORGED_HOTMAIL_RCVD2,SPF_HELO_PASS,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of michael_segel@hotmail.com does not designate 173.15.87.35 as permitted sender) Received: from [173.15.87.35] (HELO dbrack01.segel.com) (173.15.87.35) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 10:49:11 +0000 Received: from 173-15-87-33-illinois.hfc.comcastbusiness.net (173-15-87-33-Illinois.hfc.comcastbusiness.net [173.15.87.33]) by dbrack01.segel.com (Postfix) with ESMTPA id 5E97126A8E for ; Mon, 26 Aug 2013 05:48:50 -0500 (CDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Can I make use of TableSplit across Regions to make my MR job faster? From: Michael Segel In-Reply-To: Date: Mon, 26 Aug 2013 05:48:50 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org A 'table split' is a region split and as you split regions, balance the = regions, you should see some parallelism in your M/R jobs.=20 Of course depending on your choice of row keys... YMMV. HTH -Mike On Aug 26, 2013, at 2:16 AM, Pavan Sudheendra = wrote: > Hi all, >=20 > How to make use of a TableSplit or a Region Split? How is it used in > TableInputFormatBase# > getSplits() ? >=20 >=20 > I have 6 Region Servers across the cluster for the map-reduce task = which i > am using, How to leverage this so that the table is split across the > clusters and the map-reduce application finishes fast.. Right now, it = is > very slow.. For aggregating 3 table values, 1 with 100,000 rows and = other > two tables i'm only using get operating to get the value by passing = the > key.. For this setup, it takes 40-50 mins.. Which is worse.. The first > table would eventually be around 20-25m rows.. Please lead me in the = right > way.. I will paste the code if anybody is interested. >=20 >=20 > --=20 > Regards- > Pavan The opinions expressed here are mine, while they may reflect a cognitive = thought, that is purely accidental.=20 Use at your own risk.=20 Michael Segel michael_segel (AT) hotmail.com