Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 14376 invoked from network); 2 Sep 2010 01:59:26 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Sep 2010 01:59:26 -0000 Received: (qmail 49906 invoked by uid 500); 2 Sep 2010 01:59:25 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 49881 invoked by uid 500); 2 Sep 2010 01:59:24 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 49873 invoked by uid 99); 2 Sep 2010 01:59:24 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 01:59:24 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bradfordstephens@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-ww0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 01:59:03 +0000 Received: by wwi17 with SMTP id 17so1125378wwi.20 for ; Wed, 01 Sep 2010 18:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=8z96TxkmJzwXOp6dXOJbcsChViPBQJRT4h6VUCsWeQA=; b=xX2vv5ajXLs1P86/17nvw1i5jSG/xToh/xPLHtEcbCzAdtqYZlc40W6Hx8lgKczfAK 9cnDQFocMspu75f7NKPcoZExQ52KdOPlX/M5HndBiU+erUBBgZnNVCy01YSNP3gSWbr5 aPHP2Hp/ndeXzijs727tCZ/dMoJxb0nwHwE7w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=wBLxtcog7aoauBvEXDXqHK38ft1oEc85LrgmIINu5+JOh+pXDS3/qeINTLEhe3UUDD qTwFpiMbrurjTWMVitEBll0V4k6L9HZ8ktk2wM3HbvjRM0IodWn/X6bzpifLEvcf4liO NaDG5jmxmnxz8tttjj0gYs7MS50oeuu2agvVk= Received: by 10.216.52.135 with SMTP id e7mr8617207wec.98.1283392723206; Wed, 01 Sep 2010 18:58:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.11.68 with HTTP; Wed, 1 Sep 2010 18:58:23 -0700 (PDT) In-Reply-To: References: <798060.67388.qm@web65503.mail.ac4.yahoo.com> From: Bradford Stephens Date: Wed, 1 Sep 2010 18:58:23 -0700 Message-ID: Subject: Re: Slow Inserts on EC2 Cluster To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On the full data set (10 reducers), speeds are about 100k/minute (WAL Disabled). Still much slower than I'd like, but I'll take it over the former :) On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson wrote: > Yes exactly, column families have the same performance profile as > tables. =A012 CF =3D 12 tables. > > -ryan > > On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens > wrote: >> Good call JD! =A0We've gone from 20k inserts/minute to 200k. Much >> better! I still think it's slower than I'd want by about one OOM, but >> it's progress. >> >> Since we're populating 12 families, I guess we're seeking for 12 files >> on each write. Not pretty. I'll look at the customer and see if they >> really have any sparse data that would benefit from its own >> ColumnFamily. Probably not. >> >> Cheers, >> B >> >> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens >> wrote: >>> Yeah, those families are all needed -- but I didn't realize the files >>> were so small. That's odd -- and you're right, that'd certainly throw >>> it off. I'll merge them all and see if that helps. >>> >>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans wrote: >>>> Took a quick look at your RS log, it looks like you are using a lot of >>>> families and loading them pretty much at the same rate. Look at lines >>>> that start with: >>>> >>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >>>> >>>> And you will see that you are dumping very small files on the >>>> filesystem, on average 5MB, that together account for ~64MB which is >>>> the default flush size (and then it generates tons of compactions >>>> which makes it even worse). Do you really need all those families? Try >>>> merging them and see the difference. >>>> >>>> J-D >>>> >>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >>>> wrote: >>>>> 'allo, >>>>> >>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>>>> about 4k inserts /node / minute instead of 2k. A small improvement, >>>>> but nowhere near what I'm used to, even from vague memories of old >>>>> clusters on EC2. >>>>> >>>>> I also stripped all the Cascading from my code and have a very basic >>>>> raw MR job -- we're basically reading raw text, splitting it into >>>>> fields, and adding those rows to HBase. About the simplest task you >>>>> could do. >>>>> >>>>> Ideas for next steps? What other info could I share? >>>>> >>>>> Cheers, >>>>> B >>>>> >>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell = wrote: >>>>>>> From: Gary Helmling >>>>>>> >>>>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>>>> theres a known kernel issue that seems to be causing >>>>>>> high loads while idle.=A0 More info here: >>>>>>> >>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>>>> >>>>>> Seems best to avoid using Lucid on EC2 for now, then. >>>>>> >>>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8= AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>>>>> >>>>>> =A0- Andy >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Bradford Stephens, >>>>> Founder, Drawn to Scale >>>>> drawntoscalehq.com >>>>> 727.697.7528 >>>>> >>>>> http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data >>>>> solution. Process, store, query, search, and serve all your data. >>>>> >>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>>>> Media, and Computer Science >>>>> >>>> >>> >>> >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > --=20 Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science