Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 76907 invoked from network); 12 Jun 2009 05:29:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jun 2009 05:29:48 -0000 Received: (qmail 71436 invoked by uid 500); 12 Jun 2009 05:29:59 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 71404 invoked by uid 500); 12 Jun 2009 05:29:59 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 71394 invoked by uid 99); 12 Jun 2009 05:29:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 05:29:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates 209.85.222.200 as permitted sender) Received: from [209.85.222.200] (HELO mail-pz0-f200.google.com) (209.85.222.200) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 05:29:51 +0000 Received: by pzk38 with SMTP id 38so1797881pzk.5 for ; Thu, 11 Jun 2009 22:29:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=bRDF61/jVm2I8Kgpn15/MOoLshQR8JkC7FCXtIOdkqs=; b=an3/f7GyLFv0AbybHFRXP1i88V6Wd4ynDPf9Duomm9SzVvj30c2GiLtp8/hYKLF1+7 1XWumH5s9eaH0ci7naancukRqWBkfgarug77rI2BmgONXVAlo+MQx3zDbORVh8x0muqT PbmXYrqJ+s7QwdzaG6EmUVa4JydzS96EsuboM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=VcBlS4+JaTpt9pEP51mOUo6qoxeqtnG/YLnp+UZvSLSHA8LO16RUizX/1Om5A/42+/ YhdhIwHDrMPDsOCmuVdHBdbFfa12rQwEg1K/nHacadk7YXRKZJ5gp/kj/HUVJ7oTmuHH ERHPMpYdAm/GAklsPMwAbz0fE3M39HJ5eemvk= MIME-Version: 1.0 Received: by 10.114.159.17 with SMTP id h17mr5224400wae.197.1244784570530; Thu, 11 Jun 2009 22:29:30 -0700 (PDT) In-Reply-To: <860544ed0906112225o63d76025jafb8efa31f09967e@mail.gmail.com> References: <860544ed0906091013k6dc054cfm3c8e52d8b52fdc6c@mail.gmail.com> <7c962aed0906100040q609ed73cyf7911a489c2a7d1e@mail.gmail.com> <860544ed0906101450s469d57f3h698bfd67a6099165@mail.gmail.com> <78568af10906101454l225161al1f149d12d598c303@mail.gmail.com> <78568af10906101455x3c531637n829bc12987934661@mail.gmail.com> <860544ed0906101532v4e14cda7p300fed6bed3fa18a@mail.gmail.com> <78568af10906101601w6c1f3896kf1a43fd0990d78f@mail.gmail.com> <860544ed0906111907m2e698d25i6c0c65b7b3b549d5@mail.gmail.com> <78568af10906112002j2fed1e63h8b840e4b10f1aa9a@mail.gmail.com> <860544ed0906112225o63d76025jafb8efa31f09967e@mail.gmail.com> Date: Thu, 11 Jun 2009 22:29:30 -0700 Message-ID: <78568af10906112229j411dd480ve1635a1e0c0b5001@mail.gmail.com> Subject: Re: HBase Failing on Large Loads From: Ryan Rawson To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163645883e25fd33046c1ffe2f X-Virus-Checked: Checked by ClamAV on apache.org --00163645883e25fd33046c1ffe2f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Since you are on a 2-4 cpu system, you need to use: "-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" What do your gc verbose log say? are you getting huge pauses? you can up the ZK, try doing this in zoo.conf server and client: tickTime=20000 initLimit=5 syncLimit=2 and in hbase-site.xml: zookeeper.session.timeout 60000 This will give you a much higher zookeeper time out. Let us know! On Thu, Jun 11, 2009 at 10:25 PM, Bradford Stephens < bradfordstephens@gmail.com> wrote: > Thanks for helping me, o people of awesomeness. > > VM settings are 1000 for HBase, and I used the GC laid out in the > Wiki. Also, " -server " ... basically, I did everything here : > http://wiki.apache.org/hadoop/PerformanceTuning , and on > > http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html > > On Thu, Jun 11, 2009 at 8:02 PM, Ryan Rawson wrote: > > What are you vm/gc settings? Let's tune that! > > > > On Jun 11, 2009 7:08 PM, "Bradford Stephens" > > > wrote: > > > > OK, so I discovered the ulimit wasn't changed like I thought it was, > > had to fool with PAM in Ubuntu. > > > > Everything's running a little better, and I cut the data size by 66%. > > > > It took a while, but one of the machines with only 2 cores failed, and > > I caught it in the moment. Then 2 other machiens failed a few minutes > > later in a cascade. I'm thinking that HBase +Hadoop takes up so much > > proc time that the machine gradually stops responding to heartbeat.... > > does that seem rational? > > > > Here's the first regionserver log: http://pastebin.com/m96e06fe > > I wish I could attach the log of one of the regionservers that failed > > a few minutes later, but it's 708MB! Here's some examples of the tail: > > > > 2009-06-11 19:00:18,418 WARN > > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report > > to master for 906196 milliseconds - retrying > > 2009-06-11 19:00:18,419 WARN > > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting > > store file index size for 944890031/url: > > java.io.FileNotFoundException: File does not exist: > > > hdfs://dttest01:54310/hbase-0.19/joinedcontent/944890031/url/mapfiles/2512503149715575970/index > > > > The HBase Master log is surprisingly quiet... > > > > Overall, I think HBase just isn't happy on a machine with two > > single-core procs, and when they start dropping like flies, everything > > goes to hell. Do my log files support this? > > > > Cheers, > > Bradford > > > > On Wed, Jun 10, 2009 at 4:01 PM, Ryan Rawson wrote: > > > > Hey, > > Looks lke you h... > > > --00163645883e25fd33046c1ffe2f--