hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Local sockets
Date Mon, 06 Dec 2010 20:15:00 GMT
So are we talking about re-implementing IO scheduling in Hadoop at the
application level?



On Mon, Dec 6, 2010 at 12:13 PM, Rajappa Iyer <rsi@panix.com> wrote:
> Jay Booth <jaybooth@gmail.com> writes:
>
>> I don't get what they're talking about with hiding I/O limitations..  if the
>> OS is doing a poor job of handling sequential readers, that's on the OS and
>> not Hadoop, no?  In other words, I didn't see anything specific to Hadoop in
>> their "multiple readers slow down sequential access" statement, it just may
>> or may not be true for a given I/O subsystem.  The operating system is still
>> getting "open file, read, read, read, close", whether you're accessing that
>> file locally or via a datanode.  Datanodes don't close files in between read
>> calls, except at block boundaries.
>
> The root cause of the problem is the way map jobs are scheduled.  Since
> the job execution overlaps, the reads from different jobs also overlap
> and hence increase seeks.  Realistically, there's not much that the OS
> can do about it.
>
> What Vladimir is talking about is reducing the seek times by essentially
> serializing the reads through a single thread per disk.  You could
> either cleverly reorganize the reads so that seek is minimized and/or
> read the entire block in one call.
>
> -rsi
>
>>
>> On Mon, Dec 6, 2010 at 2:39 PM, Vladimir Rodionov
>> <vrodionov@carrieriq.com>wrote:
>>
>>> Todd,
>>>
>>> There are  some curious people who had spent time (and tax payers money :)
>>>  and have came to  the same conclusion (as me):
>>>
>>> http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf
>>>
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>> ________________________________________
>>> From: Todd Lipcon [todd@cloudera.com]
>>> Sent: Monday, December 06, 2010 10:04 AM
>>> To: dev@hbase.apache.org
>>> Subject: Re: Local sockets
>>>
>>> On Mon, Dec 6, 2010 at 9:59 AM, Vladimir Rodionov
>>> <vrodionov@carrieriq.com>wrote:
>>>
>>> > Todd,
>>> >
>>> > The major hdfs problem is inefficient processing of multiple streams in
>>> > parallel -
>>> > multiple readers/writers per one physical drive result in significant
>>> drop
>>> > in overall
>>> > I/O throughput on Linux (tested with ext3, ext4). There should be only
>>> one
>>> > reader thread,
>>> > one writer thread per physical drive (until we get AIO support in Java)
>>> >
>>> > Multiple data buffer copies in pipeline do not improve situation as well.
>>> >
>>>
>>> In my benchmarks, the copies account for only a minor amount of the
>>> overhead. Do a benchmark of ChecksumLocalFilesystem vs RawLocalFilesystem
>>> and you should see the 2x difference I mentioned for data that's in buffer
>>> cache.
>>>
>>> As for parallel reader streams, I disagree with your assessment. After
>>> tuning readahead and with a decent elevator algorithm (anticipatory seems
>>> best in my benchmarks) it's better to have multiple threads reading from a
>>> drive compared to one, unless we had AIO. Otherwise we won't be able to
>>> have
>>> multiple outstanding requests to the block device, and the elevator will be
>>> powerless to do any reordering of reads.
>>>
>>>
>>> > CRC32 can be fast btw and some other hashing algos can be even faster
>>> (like
>>> > murmur2 -1.5GB per sec)
>>> >
>>>
>>> Our CRC32 implementation goes around 750MB/sec on raw data, but for
>>> whatever
>>> undiscovered reason it adds a lot more overhead when you mix it into the
>>> data pipeline. HDFS-347 has some interesting benchmarks there.
>>>
>>> -Todd
>>>
>>> >
>>> > ________________________________________
>>> > From: Todd Lipcon [todd@cloudera.com]
>>> > Sent: Saturday, December 04, 2010 3:04 PM
>>> > To: dev@hbase.apache.org
>>> > Subject: Re: Local sockets
>>> >
>>> > On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov
>>> > <vrodionov@carrieriq.com>wrote:
>>> >
>>> > > From my own experiments performance difference is huge even on
>>> > > sequential R/W operations (up to 300%) when you do local File I/O vs
>>> HDFS
>>> > > File I/O
>>> > >
>>> > > Overhead of HDFS I/O is substantial to say the least.
>>> > >
>>> > >
>>> > Much of this is from checksumming, though - turn off checksums and you
>>> > should see about a 2x improvement at least.
>>> >
>>> > -Todd
>>> >
>>> >
>>> > > Best regards,
>>> > > Vladimir Rodionov
>>> > > Principal Platform Engineer
>>> > > Carrier IQ, www.carrieriq.com
>>> > > e-mail: vrodionov@carrieriq.com
>>> > >
>>> > > ________________________________________
>>> > > From: Todd Lipcon [todd@cloudera.com]
>>> > > Sent: Saturday, December 04, 2010 12:30 PM
>>> > > To: dev@hbase.apache.org
>>> > > Subject: Re: Local sockets
>>> > >
>>> > > Hi Leen,
>>> > >
>>> > > Check out HDFS-347 for more info on this. I hope to pick this back
up
>>> in
>>> > > 2011 - in 2010 we mostly focused on stability above performance in
>>> > HBase's
>>> > > interactions with HDFS.
>>> > >
>>> > > Thanks
>>> > > -Todd
>>> > >
>>> > > On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <toelen@gmail.com>
wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > has anyone tested the performance impact (when there is a hdfs
>>> > > > datanode and a hbase node on the same machine) of using unix domain
>>> > > > sockets communication or shared memory ipc using nio? I guess
this
>>> > > > should make a difference on reads?
>>> > > >
>>> > > > Regards,
>>> > > > Leen
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Todd Lipcon
>>> > > Software Engineer, Cloudera
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Todd Lipcon
>>> > Software Engineer, Cloudera
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>

Mime
View raw message