hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Yarn application reading from Data node using short-circuit.
Date Thu, 19 Nov 2015 17:22:12 GMT
Hello Sandeep,

As long as you have enabled short-circuit read as per the documentation [1], I expect any
Hadoop process will take advantage of it while reading a local replica.  However, short-circuit
read will not completely eliminate TCP connection activity to the DataNode.  There will still
be a TCP connection from the client to the DataNode to perform a handshake and establish the
Unix domain socket.  This is a very small payload though compared to the transfer of block
data over the Unix domain socket.

[1] http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html

--Chris Nauroth

From: sandeep das <yarnhadoop@gmail.com<mailto:yarnhadoop@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Wednesday, November 18, 2015 at 10:44 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Yarn application reading from Data node using short-circuit.


I was going through some benchmarking and realized that there are lots of TCP connections
are initiated while running my PIG jobs over YARN(MR2). These TCP connections are related
to data node. Although short-circuit is enabled in my data nodes but still a lot TCP connections
are being created.

I wanted to check that how can we enable YARN applicationMaster to read data from Data node
using short-circuits i.e. unix domain sockets. I believe that will improve the performance
of our jobs.

Can someone please help to understand how can I make sure that MR2 jobs created by PIG scripts
are reading data from Data node using short-circuit instead of TCP connections?


View raw message