hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: HDFS-1599 status? (HDFS tickets to improve HBase)
Date Fri, 03 Jun 2011 19:50:14 GMT
Thanks everybody for commenting on this thread.  

We'd certainly like to lobby for movement on these two tickets, and although we don't have
anybody that is familiar with the source code we'd be happy to perform some tests get some
performance numbers.

Per Kihwal's comments, it sounds like HDFS-941 needs to get re-worked because the patch is
stale.

The patch for HDFS-347 sounds like it's still usable.

So what else is needed to push this effort forward?  Is it beneficial to get more numbers
on HDFS-347 and keep lobbying on the ticket, and/or is there another path that should be taken
(plying with beer, free Cleveland Indians tickets, harassing phone calls, etc.)? 



-----Original Message-----
From: Dhruba Borthakur [mailto:dhruba@gmail.com] 
Sent: Friday, June 03, 2011 3:00 PM
To: dev@hbase.apache.org
Subject: Re: HDFS-1599 status? (HDFS tickets to improve HBase)

I completely agree with Ryan. Most of the measurements in HDFS-347 are point comparisions....
data rate over socket, single-threaded sequential read from datanode, single-threaded random
read form datanode, etc. These measurements are good, but when you run the entire Hbase system
at load, you definitely see a 3X performance improvement when reading data locally (instead
of going through the datanode).

-dhruba

On Fri, Jun 3, 2011 at 11:08 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Could you explain your HDFS-347 comment more?  I dont think people 
> suggested that the socket itself was the primary issue, but dealing 
> with the datanode and the socket and everything was really slow.  It's 
> hard to separate concerns and test only 1 thing at a time - for 
> example you said 'local socket comm isnt the problem', but there is no 
> way to build a test that uses a local socket but not the datanode.
>
> The basic fact is that datanode adds a lot of overhead, and under high 
> concurrency that overhead grows.
>
>
>
> On Fri, Jun 3, 2011 at 7:07 AM, Kihwal Lee <kihwal@yahoo-inc.com> wrote:
> > HDFS-941
> > The trunk has moved on so the patch won't apply.  There has been
> significant changes in HDFS lately, so it will require more than 
> simple rebase/merge.  If the original assignee is busy, I am willing to help.
> >
> > HDFS-347
> > The analysis is pointing out that local socket communication is 
> > actually
> not the problem. The initial assumption of local socket being slow 
> should be ignored and the design should be revisited.
> >
> > I agree that improving local pread performance is critical.  Based 
> > on my
> experiments, HDFS-941 helps a lot and the communication channel became 
> no longer the bottleneck.
> >
> > Kihwal
> >
> >
> > On 6/2/11 4:00 PM, "Doug Meil" <doug.meil@explorysmedical.com> wrote:
> >
> > Hi folks, I was wondering if there was any movement on any of these 
> > HDFS
> tickets for HBase.  The umbrella ticket is HDFS-1599, but the last 
> comment from stack back in Feb highlighted interest in several tickets:
> >
> >
> > 1)      HDFS-918 (use single selector)
> >
> > a.       Last comment Jan 2011
> >
> >
> >
> > 2)      HDFS-941 (reuse of connection)
> >
> > a.       Patch available as of April 2011
> >
> > b.      But ticket still unresolved.
> >
> >
> >
> > 3)      HDFS-347 (local reads)
> >
> > a.       Discussion seemed to end in March 2011 with a huge comment
> saying that there was no performance benefit.
> >
> > b.      I'm working my way through this comment/report, but intuitively
> it seems like it would be a good idea since as the other comments in 
> the ticket stated the RS reads locally just about every time.
> >
> >
> > Doug Meil
> > Chief Software Architect, Explorys
> > doug.meil@explorys.com
> >
> >
> >
>



--
Connect to me at http://www.facebook.com/dhruba

Mime
View raw message