hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Local sockets
Date Mon, 06 Dec 2010 17:59:29 GMT
Todd,

The major hdfs problem is inefficient processing of multiple streams in parallel - 
multiple readers/writers per one physical drive result in significant drop in overall 
I/O throughput on Linux (tested with ext3, ext4). There should be only one reader thread,
one writer thread per physical drive (until we get AIO support in Java)

Multiple data buffer copies in pipeline do not improve situation as well.

CRC32 can be fast btw and some other hashing algos can be even faster (like murmur2 -1.5GB
per sec)

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Todd Lipcon [todd@cloudera.com]
Sent: Saturday, December 04, 2010 3:04 PM
To: dev@hbase.apache.org
Subject: Re: Local sockets

On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> From my own experiments performance difference is huge even on
> sequential R/W operations (up to 300%) when you do local File I/O vs HDFS
> File I/O
>
> Overhead of HDFS I/O is substantial to say the least.
>
>
Much of this is from checksumming, though - turn off checksums and you
should see about a 2x improvement at least.

-Todd


> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Todd Lipcon [todd@cloudera.com]
> Sent: Saturday, December 04, 2010 12:30 PM
> To: dev@hbase.apache.org
> Subject: Re: Local sockets
>
> Hi Leen,
>
> Check out HDFS-347 for more info on this. I hope to pick this back up in
> 2011 - in 2010 we mostly focused on stability above performance in HBase's
> interactions with HDFS.
>
> Thanks
> -Todd
>
> On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <toelen@gmail.com> wrote:
>
> > Hi,
> >
> > has anyone tested the performance impact (when there is a hdfs
> > datanode and a hbase node on the same machine) of using unix domain
> > sockets communication or shared memory ipc using nio? I guess this
> > should make a difference on reads?
> >
> > Regards,
> > Leen
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



--
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message