From user-return-10575-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Sat May 22 16:26:18 2010 Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 68408 invoked from network); 22 May 2010 16:26:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 May 2010 16:26:18 -0000 Received: (qmail 94339 invoked by uid 500); 22 May 2010 16:26:17 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 94309 invoked by uid 500); 22 May 2010 16:26:17 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 94300 invoked by uid 99); 22 May 2010 16:26:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 16:26:17 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.221.179 as permitted sender) Received: from [209.85.221.179] (HELO mail-qy0-f179.google.com) (209.85.221.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 16:26:12 +0000 Received: by qyk9 with SMTP id 9so3239011qyk.2 for ; Sat, 22 May 2010 09:25:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=vVbsux3DhNhXUanYex4LAGZjx0/j9BXgmCcNApz6ut8=; b=m8wGenKVLdmPDT9p1r4gm3EDF4eySJvYliGvMszqd0RN2t08Sf8UCiwgd1NsimLghr BAdfrBgwdYD/Hk4xqfg3ZeJ5TYOgLVn8vweQmOC7GLNB89fitB0b66yFSI3bxd7Xg0Db rR7MndskSaH51eDXVR1PFtMLBYxGfh1HJ1OSw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=JL2Ht9j9diKWPlzXnaRNGQE7Ir+cx6mkLok3grn0Ozgc5LDLheHl/ifW4qP2JNOH2K gHMrOEMKIIUPJNtUcHdmn3biq6FZYPc2qCtvoesEfdnt+/GaG7lPYp9mb9QBWMeNyHaZ IRZW1TgvW3Dqogxx3mRKoxZGZ5Q1hvjvNYztk= MIME-Version: 1.0 Received: by 10.229.246.82 with SMTP id lx18mr738606qcb.80.1274545551843; Sat, 22 May 2010 09:25:51 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.229.240.9 with HTTP; Sat, 22 May 2010 09:25:51 -0700 (PDT) In-Reply-To: References: Date: Sat, 22 May 2010 09:25:51 -0700 X-Google-Sender-Auth: fXX3IvSESbyiFQcVHxrSDkNQ8xA Message-ID: Subject: Re: RowCounter example run time From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable My first question would be, what do you expect exactly? Would 5 min be enough? Or are you expecting something more like 1-2 secs (which is impossible since this is mapreduce)? Then there's also Jon's questions. Finally, did you set a higher scanner caching on that job? hbase.client.scanner.caching is the name of the config, which defaults to 1. When mapping a HBase table, if you don't set it higher you're basically benchmarking the RPC layer since it does 1 call per next() invocation. Setting the right value depends on the size of your rows eg are you storing 60 bytes or something high like 100KB? On our 13B rows table (each row is a few bytes), we set it to 10k. J-D On Sat, May 22, 2010 at 8:40 AM, Andrew Nguyen wrote: > Hello, > > I finally got some decent hardware to put together a 1 master, 4 slave Ha= doop/HBase cluster. =A0However, I'm still waiting for space in the datacent= er to clear out and only have 3 of the nodes deployed (master + 2 slaves). = =A0Each node is a quad-core AMD with 8G of RAM, running on a GigE network. = =A0HDFS is configured to run on a separate (from the OS drive) U320 drive. = =A0The master has RAID1 mirrored drives only. > > I've installed HBase with slave1 and slave2 as regionservers and master, = slave1, slave2 as the ZK quorom. =A0The master serves as the NN and JT and = the slaves as DN and TT. > > Now my question: > > I've imported 22.5M rows into HBase, into a single table. =A0Each row has= 8 or so columns. =A0I just ran the RowCounter MR example and it takes abou= t 25 minutes to complete. =A0Is a 3 node setup too underpowered to combat t= he overhead of Hadoop and HBase? =A0Or, could it be something with my confi= guration? =A0I've been playing around with Hadoop some but this is my first= attempt at anything HBase. > > Thanks! > > --Andrew