Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of dalia.mohsobhy@hotmail.com
 designates 157.55.1.174 as permitted sender)
Message-ID: <DUB114-W107493D50B69E706A3170C485340@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_d526371a-2fd6-442b-9d70-4de53eeb0163_"
From: Dalia Sobhy <dalia.mohsobhy@hotmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Subject: RE: Hbase scalability performance
Date: Sun, 23 Dec 2012 15:42:58 +0200
Importance: Normal
In-Reply-To: 
 <CAMVC6ROaSNGGRq5pL70MXHD8zeQDf+YdcnUmTZAj2ALJkYscdw@mail.gmail.com>
References: 
 <DUB404-EAS344BE86D5936CA83EA7330F85350@phx.gbl>,<CALte62z4Uf0u29_h1a_1iBiMxiWqceH-JsKOt4D8UvTw97p8+A@mail.gmail.com>,<BLU0-SMTP9327A74D41F97233DC57C78F350@phx.gbl>
 <CAKxWWm2g3yWhXrdFH1cmJ1+YSNc89YR8HVFP5F+kXB3KYvG4NA@mail.gmail.com>,<CAOT3TWomKLwo3CE82ULPojYZ+nsaQ4YeQW=HMeo8YqBTunvP6A@mail.gmail.com>,<CAMVC6ROaSNGGRq5pL70MXHD8zeQDf+YdcnUmTZAj2ALJkYscdw@mail.gmail.com>
MIME-Version: 1.0

--_d526371a-2fd6-442b-9d70-4de53eeb0163_
Content-Type: text/plain; charset="windows-1256"
Content-Transfer-Encoding: 8bit


Dear all,

Thanks for your help.

I am already using coprocessors for this table.

I already tried a program similar to it but using thrift server and my cluster was 23 nodes on Rackspace cloud, but the same I didn't see any improved performance. Then I was advised to use actual machines (not virtual ones), and greater bandwidth than 100Mbps. They told me those two issues caused this performance. But upon trial, I found the same case.

   
> From: dontariq@gmail.com
> Date: Sat, 22 Dec 2012 23:09:54 +0530
> Subject: Re: Hbase scalability performance
> To: user@hbase.apache.org
> 
> I totally agree with Michael. I was about to point out the same thing.
> Probability of RS hotspotting is high when we have sequential keys. Even if
> everything is balanced and your cluster is very well configured you might
> end up with this issue.
> 
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
> 
> 
> On Sat, Dec 22, 2012 at 10:24 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:
> 
> > Also, check how balanced your region servers are accross all the nodes
> >
> > On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <varun@pinterest.com> wrote:
> >
> > > Note that adding nodes will improve throughput and not latency. So, if
> > your
> > > client application for benchmarking is single threaded, do not expect an
> > > improvement in number of reads per second by just adding nodes.
> > >
> > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <
> > michael_segel@hotmail.com
> > > >wrote:
> > >
> > > > I thought it was Doug Miel who said that HBase doesn't start to shine
> > > > until you had at least 5 nodes.
> > > > (Apologies if I misspelled Doug's name.)
> > > >
> > > > I happen to concur and if you want to start testing scalability, you
> > will
> > > > want to build a bigger test rig.
> > > >
> > > > Just saying!
> > > >
> > > >
> > > > Oh and you're going to have a hot spot on that row key.
> > > > Maybe do a hashed UUID ?
> > > >
> > > > I would suggest that you consider the following:
> > > >
> > > > Create N number of rows... where N is a very large number of rows.
> > > > Then to generate your random access, do a full table scan to get the N
> > > row
> > > > keys in to memory.
> > > > Using a random number generator,  generate a random number and pop that
> > > > row off the stack so that the next iteration is between 1 and (N-1).
> > > > Do this 200K times.
> > > >
> > > > Now time your 200K random fetches.
> > > >
> > > > It would be interesting to see how it performs  getting an average of a
> > > > 'couple' of runs... then increase the key space by an order of
> > magnitude.
> > > > (Start w 1 million rows, 10 million rows, 100 million rows.... )
> > > >
> > > > In theory... if properly tuned. One should expect near linear results .
> > > >  That is to say the time it takes to get() a row across the data space
> > > > should be consistent. Although I wonder if you would have to somehow
> > > clear
> > > > the cache?
> > > >
> > > >
> > > > Sorry, just a random thought...
> > > >
> > > > -Mike
> > > >
> > > > On Dec 22, 2012, at 10:06 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > By '3 datanodes', did you mean that you also increased the number of
> > > > region
> > > > > servers to 3 ?
> > > > >
> > > > > When your test was running, did you look at Web UI to see whether
> > load
> > > > was
> > > > > balanced ? You can also use Ganglia for such purpose.
> > > > >
> > > > > What version of HBase are you using ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com
> > > > >wrote:
> > > > >
> > > > >> Dear all,
> > > > >>
> > > > >> I am testing a simple hbase application on a cluster of multiple
> > > nodes.
> > > > >>
> > > > >> I am especially testing the scalability performance, by measuring
> > the
> > > > time
> > > > >> taken for random reads
> > > > >>
> > > > >> Data size: 200,000 row
> > > > >> Row key : 0,1,2 very simple row key incremental
> > > > >>
> > > > >> But i don't know why by increasing the cluster size, I see the same
> > > > time.
> > > > >>
> > > > >> For ex:
> > > > >> 2 Datanodes: 1000 random read: 1.757 sec
> > > > >> 3 datanodes: 1000 random read: 1.7 sec
> > > > >>
> > > > >> So any help plzzz ??
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
 		 	   		  
--_d526371a-2fd6-442b-9d70-4de53eeb0163_--