hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (POWERSET)" <Jim.Keller...@microsoft.com>
Subject RE: Hbase / Hadoop Tuning
Date Thu, 02 Oct 2008 19:58:08 GMT
Responses inline below.
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 12:39 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Thank You Jim for a quick answer.
> 1) If i understand correct, using 2 clients should allow me improve
> the performance twice (more or less) ?

I don't know if you will get 2x performance, but it will be greater than 1x.

> 2) Currently, our webapp is HBase client using Htable - is that what you
> meant, when you said "(HBase, not web) clients" ?

If multiple requests come into your webapp, and your webapp is multithreaded, you will not
see a performance increase.

If your webapp runs a different process for each request, you will see
a performance increase because the RPC connection will not be shared
and consequently will not block on the giant lock. That is why I
recommended splitting up your job using Map/Reduce.

> 3) 64MB for single region server is a minimum size or could be less ?

It could be less, but that is the default block size for the Hadoop DFS.
If you make it smaller, you might want to change the default block size
for Hadoop as well.

> 4) When is planed to fix the RPC lock for concurrent operations
> in single client ?

This change is targeted for somewhere in the next 6 months according
to the roadmap.


> Thank You Again and Best Regards.
>
>
> On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> Jim.Kellerman@microsoft.com> wrote:
>
> > What you are storing is 140,000,000 bytes, so having multiple
> > region servers will not help you as a single region is only
> > served by a single region server. By default, regions split
> > when they reach 256MB. So until the region splits, all traffic
> > will go to a single region server. You might try reducing the
> > maximum file size to encourage region splitting by changing the
> > value of hbase.hregion.max.filesize to 64MB.
> >
> > Using a single client will also limit write performance.
> > Even if the client is multi-threaded, there is a big giant lock
> > in the RPC mechanism which prevents concurrent requests (This
> > is something we plan to fix in the future).
> >
> > Multiple clients do not block against one another the way multi-
> > threaded clients do currently. So another way to increase
> > write performance would be to run multiple (HBase, not web) clients,
> > by either running multiple processes directly, or by utilizing
> > a Map/Reduce job to do the writes.
> >
> > ---
> > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> >
> >
> > > -----Original Message-----
> > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > Sent: Thursday, October 02, 2008 12:07 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Hbase / Hadoop Tuning
> > >
> > > Hi.Thank you for quick response.
> > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with 1gb
> > > network interface.
> > > All machines in the same rec. On one machine (master) we are running
> > > Tomcat
> > > with one webapp
> > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > running
> > > the CPU load is less the 1%.
> > >
> > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > Hbase cluster is one master and 6 region servers.
> > >
> > > Row addition is done by BatchUpdate and commint into single column
> > family.
> > > The data is simple bytes array (1400 bytes each row).
> > >
> > >
> > > Thank You and Best Regards.
> > >
> > >
> > >
> > >
> > > On Thu, Oct 2, 2008 at 9:39 PM, stack <stack@duboce.net> wrote:
> > >
> > > > Tell us more Slava.  HBase versions and how many regions you have in
> > > your
> > > > cluster?
> > > >
> > > > If small rows, your best boost will likely come when we support
> > batching
> > > of
> > > > updates: HBASE-748.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > Slava Gorelik wrote:
> > > >
> > > >> Hi All.
> > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > >> 7 from them is also region servers and 1 is Master, default
> > replication
> > > -
> > > >> 3.
> > > >> We have application that heavy writes with relative small rows -
> about
> > > >> 10Kb,
> > > >> current performance is 100000 rows in 580000 Milisec - 5.8 Milisec
> /
> > > row.
> > > >> Is there any way to improve this performance by some tuning /
> tweaking
> > > >> HBase
> > > >> or Hadoop ?
> > > >>
> > > >> Thank You and Best Regards.
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> >

Mime
View raw message