Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7FC9FEC29 for ; Sun, 23 Dec 2012 13:43:28 +0000 (UTC) Received: (qmail 14139 invoked by uid 500); 23 Dec 2012 13:43:26 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 13869 invoked by uid 500); 23 Dec 2012 13:43:26 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 13853 invoked by uid 99); 23 Dec 2012 13:43:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Dec 2012 13:43:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dalia.mohsobhy@hotmail.com designates 157.55.1.174 as permitted sender) Received: from [157.55.1.174] (HELO dub0-omc2-s35.dub0.hotmail.com) (157.55.1.174) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Dec 2012 13:43:20 +0000 Received: from DUB114-W107 ([157.55.1.136]) by dub0-omc2-s35.dub0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 23 Dec 2012 05:42:58 -0800 X-EIP: [MAmGLgcMUvsAPX4wnvKQOcoQZ7wwbBVA] X-Originating-Email: [dalia.mohsobhy@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_d526371a-2fd6-442b-9d70-4de53eeb0163_" From: Dalia Sobhy To: "user@hbase.apache.org" Subject: RE: Hbase scalability performance Date: Sun, 23 Dec 2012 15:42:58 +0200 Importance: Normal In-Reply-To: References: ,, ,, MIME-Version: 1.0 X-OriginalArrivalTime: 23 Dec 2012 13:42:58.0377 (UTC) FILETIME=[6B42A390:01CDE113] X-Virus-Checked: Checked by ClamAV on apache.org --_d526371a-2fd6-442b-9d70-4de53eeb0163_ Content-Type: text/plain; charset="windows-1256" Content-Transfer-Encoding: 8bit Dear all, Thanks for your help. I am already using coprocessors for this table. I already tried a program similar to it but using thrift server and my cluster was 23 nodes on Rackspace cloud, but the same I didn't see any improved performance. Then I was advised to use actual machines (not virtual ones), and greater bandwidth than 100Mbps. They told me those two issues caused this performance. But upon trial, I found the same case. > From: dontariq@gmail.com > Date: Sat, 22 Dec 2012 23:09:54 +0530 > Subject: Re: Hbase scalability performance > To: user@hbase.apache.org > > I totally agree with Michael. I was about to point out the same thing. > Probability of RS hotspotting is high when we have sequential keys. Even if > everything is balanced and your cluster is very well configured you might > end up with this issue. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Sat, Dec 22, 2012 at 10:24 PM, Mohit Anchlia wrote: > > > Also, check how balanced your region servers are accross all the nodes > > > > On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma wrote: > > > > > Note that adding nodes will improve throughput and not latency. So, if > > your > > > client application for benchmarking is single threaded, do not expect an > > > improvement in number of reads per second by just adding nodes. > > > > > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel < > > michael_segel@hotmail.com > > > >wrote: > > > > > > > I thought it was Doug Miel who said that HBase doesn't start to shine > > > > until you had at least 5 nodes. > > > > (Apologies if I misspelled Doug's name.) > > > > > > > > I happen to concur and if you want to start testing scalability, you > > will > > > > want to build a bigger test rig. > > > > > > > > Just saying! > > > > > > > > > > > > Oh and you're going to have a hot spot on that row key. > > > > Maybe do a hashed UUID ? > > > > > > > > I would suggest that you consider the following: > > > > > > > > Create N number of rows... where N is a very large number of rows. > > > > Then to generate your random access, do a full table scan to get the N > > > row > > > > keys in to memory. > > > > Using a random number generator, generate a random number and pop that > > > > row off the stack so that the next iteration is between 1 and (N-1). > > > > Do this 200K times. > > > > > > > > Now time your 200K random fetches. > > > > > > > > It would be interesting to see how it performs getting an average of a > > > > 'couple' of runs... then increase the key space by an order of > > magnitude. > > > > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > > > > > > > In theory... if properly tuned. One should expect near linear results . > > > > That is to say the time it takes to get() a row across the data space > > > > should be consistent. Although I wonder if you would have to somehow > > > clear > > > > the cache? > > > > > > > > > > > > Sorry, just a random thought... > > > > > > > > -Mike > > > > > > > > On Dec 22, 2012, at 10:06 AM, Ted Yu wrote: > > > > > > > > > By '3 datanodes', did you mean that you also increased the number of > > > > region > > > > > servers to 3 ? > > > > > > > > > > When your test was running, did you look at Web UI to see whether > > load > > > > was > > > > > balanced ? You can also use Ganglia for such purpose. > > > > > > > > > > What version of HBase are you using ? > > > > > > > > > > Thanks > > > > > > > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy < > > > dalia.mohsobhy@hotmail.com > > > > >wrote: > > > > > > > > > >> Dear all, > > > > >> > > > > >> I am testing a simple hbase application on a cluster of multiple > > > nodes. > > > > >> > > > > >> I am especially testing the scalability performance, by measuring > > the > > > > time > > > > >> taken for random reads > > > > >> > > > > >> Data size: 200,000 row > > > > >> Row key : 0,1,2 very simple row key incremental > > > > >> > > > > >> But i don't know why by increasing the cluster size, I see the same > > > > time. > > > > >> > > > > >> For ex: > > > > >> 2 Datanodes: 1000 random read: 1.757 sec > > > > >> 3 datanodes: 1000 random read: 1.7 sec > > > > >> > > > > >> So any help plzzz ?? > > > > >> > > > > >> > > > > > > > > > > > > > --_d526371a-2fd6-442b-9d70-4de53eeb0163_--