Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 61039 invoked from network); 27 Dec 2010 21:04:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Dec 2010 21:04:13 -0000 Received: (qmail 76966 invoked by uid 500); 27 Dec 2010 21:04:12 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 76932 invoked by uid 500); 27 Dec 2010 21:04:12 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 76924 invoked by uid 99); 27 Dec 2010 21:04:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Dec 2010 21:04:12 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nanhengwu@gmail.com designates 209.85.210.169 as permitted sender) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Dec 2010 21:04:06 +0000 Received: by iyj17 with SMTP id 17so8216473iyj.14 for ; Mon, 27 Dec 2010 13:03:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=+SGbv1pE/tQEEW9EonD3fusZL+AyoHXUk8ye27ZEFz4=; b=pNygM8G1GNNopG9wjeOtVaszw622dzkWXcGpFATKv/2kXcXCcidTCqs/Gv0Oi1SjNq H+iId7oCe77m/6ylEwwmsiRwHCrw3EU4ICsarIny3g+UvrKBovRpzetP/0ijHi+ufNmj uGl5COy6LGy8FXUSYTB0cFtDS2k4deO2sCalw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=oWEIo2eewWFfEuIOvVYnmmQX8IAEDaZZCmpASLNXS1/QEZE3JarLsqo2YfyJyBcnxK f//k/mGCpCFOZIev7LUr1Ir143qezcvz/YUwwsRFR2P1ex+tzZIMLzmcw6HotR8ECohJ wLUj3nbDcaNqsnff1RjgRM3goYzAJriwHNqyY= MIME-Version: 1.0 Received: by 10.231.206.80 with SMTP id ft16mr12786118ibb.110.1293483826198; Mon, 27 Dec 2010 13:03:46 -0800 (PST) Received: by 10.231.141.76 with HTTP; Mon, 27 Dec 2010 13:03:46 -0800 (PST) In-Reply-To: References: Date: Mon, 27 Dec 2010 13:03:46 -0800 Message-ID: Subject: Re: Bulk load questions From: Nanheng Wu To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for the answers. I will use these as my basis for investigation. I am using a mapper only job, is it better to use the HBase client to write to HBase or TableOutputFormat? On Mon, Dec 27, 2010 at 8:38 AM, Stack wrote: > On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu wrote: >> I am running some tests to load data from HDFS into HBase in a MR job. >> I am pretty new to HBase and I have some questions regarding bulk load >> performance: I have a small cluster with 4 nodes, I set up one node to >> run Namenode/JobTracker/ZK, and the other three nodes all run >> TaskTracker/DataNode/HRegion. During my test I am seeing about 1300 >> inserts per second total and it feels kind of slow. > > I don't know what your hardware is like but yeah, it sounds kinda slow. > > > My rows are pretty >> small ~250 bytes. I am wondering if it is a good idea to be running MR >> on all nodes. Would it be better if I run MR load job on separate >> nodes? > > Well, where do you think the time is being spent? =A0What is holding up > the job do you think? =A0Is your MR job doing any massaging of the data. > =A0Do you have many concurrent mappers run at same time on each node? > Does your MR job do a map and reduce or just a map? =A0Is it the insert > into hbase that is slow? =A0What do the hbase logs say? =A0Are they > blocking because they are flushing memory? > > Also I observe that one task tracker's CPU usage was twice as >> high as the other two. > > Maybe its the one that is doing the inserting? =A0How many regions in > your hbase cluster? =A0When you look at hbase UI, is load being spread > across the hbase cluster or you just hitting one node? > > St.Ack > > =A0I can't figure out why that is, does that >> indicate some hot spots in the cluster? I'd really appreciate some >> ideas, and please let me know if my description is not specific or >> detailed enough and what other information I can provide to help >> diagnose the problem. Thanks! >> >