Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 71714 invoked from network); 12 Jan 2010 18:53:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2010 18:53:40 -0000 Received: (qmail 82945 invoked by uid 500); 12 Jan 2010 18:53:40 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 82908 invoked by uid 500); 12 Jan 2010 18:53:40 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 82899 invoked by uid 99); 12 Jan 2010 18:53:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 18:53:40 +0000 X-ASF-Spam-Status: No, hits=-8.0 required=10.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Michael.Clements@disney.com designates 204.128.192.16 as permitted sender) Received: from [204.128.192.16] (HELO mail2.disney.com) (204.128.192.16) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 18:53:30 +0000 Received: from imr1.disney.pvt (imr1.disney.pvt [153.7.231.20]) by mail2.disney.com with ESMTP; Tue, 12 Jan 2010 18:53:08 Z Received: from sm-cala-xgw01b.swna.wdpr.disney.com (sm-cala-xgw01b.swna.wdpr.disney.com [153.7.30.142]) by imr1.disney.pvt with ESMTP; Tue, 12 Jan 2010 18:53:08 Z Received: from SM-CALA-VXMB03C.swna.wdpr.disney.com ([153.7.195.153]) by sm-cala-xgw01b.swna.wdpr.disney.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 12 Jan 2010 10:53:08 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: how to load big files into Hbase without crashing? Date: Tue, 12 Jan 2010 18:53:05 +0000 Message-Id: <23E512539066824E9B836EA8709831A905EBA275@SM-CALA-VXMB03C.swna.wdpr.disney.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: how to load big files into Hbase without crashing? thread-index: AcqTuHliOtSOZQGERGiM07b09rzm8Q== From: "Clements, Michael" To: X-OriginalArrivalTime: 12 Jan 2010 18:53:08.0507 (UTC) FILETIME=[7B481EB0:01CA93B8] I have 15-node Hadoop cluster that is working for most jobs. But every time I upload large data files into HBase, the job fails. I surmise that this file (15GB in size) is big enough, there are so many tasks (about 55 at once), they swamp the region server processes. Each cluster node is also an HBase region server, so there are a minimum of about 4 jobs for each region server. But when the table is small, there are few regions so each region server is hosting many more tasks. For example if the table starts out empty there is a single region, so a single region server has to handle calls from all 55 tasks. It can't handle this, the tasks give up and the job fails. This is just conjecture on my part. Does it sound reasonable? If so, what methods are there to prevent this? Limiting the number of tasks for the upload job is one obvious solution, but what is a good limit? The more general question is, how many map tasks can a typical region server support? Limiting the number of tasks is tedious and error-prone, as it requires somebody to look at the HBase table, see how many regions it has, on which servers, and manually configure the job accordingly. If the job is big enough, then the number of regions will grow during the job and the initial task counts won't be ideal anymore. Ideally, the Hadoop framework would be smart enough to look at how many regions & region servers exist and dynamically allocate a reasonable number of tasks. Does the community have any knowledge or techniques to handle this? Thanks Michael Clements Solutions Architect michael.clements@disney.com 206 664-4374 office 360 317 5051 mobile