hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: 0.92 and Read/writes not scaling
Date Tue, 03 Apr 2012 16:56:45 GMT
The hypothesis was that since I was seeing TCP ack delays in ganglia, it
may have to do with the TCP_NODELAY setting on the write side.   The hdfs
client sets this in the read side DFSInputStream, here but not on the
DFSOutputStream write side:

https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L836


// TCP_NODELAY is crucial here because of bad interactions between
// Nagle's Algorithm and Delayed ACKs. With connection keepalive
// between the client and DN, the conversation looks like:
// 1. Client -> DN: Read block X
// 2. DN -> Client: data for block X
// 3. Client -> DN: Status OK (successful read)
// 4. Client -> DN: Read block Y
// The fact that step #3 and #4 are both in the client->DN direction
// triggers Nagling. If the DN is using delayed ACKs, this results
// in a delay of 40ms or more.
//

The fact that I am getting ackDelays on a write test may indicate that we
need this set TCP_NODELAY on the HBase HLog write side --
(HDFS's DFSClient.DFSOutputStream in hadoop 0.20.x and DFSOutputStream in
0.23.)  I did a quick hack and test adding socket.setNoTcpDelay(true) on
that write side of a hadoop 0.20.x and reran the PE tests; unfortunately,
we still seem to have the socketTimeoutException problems.  Needs more
digging..

 Jon

On Mon, Apr 2, 2012 at 8:50 PM, Stack <stack@duboce.net> wrote:

> On Mon, Apr 2, 2012 at 8:19 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
> >  I'm in the process of testing a hypothesis Todd suggested
> > and will share results after test is done.
> >
>
> What is the hypothesis?
> St.Ack
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message