Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14DAD17BA4 for ; Thu, 19 Nov 2015 17:22:28 +0000 (UTC) Received: (qmail 31473 invoked by uid 500); 19 Nov 2015 17:22:23 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 31336 invoked by uid 500); 19 Nov 2015 17:22:23 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31326 invoked by uid 99); 19 Nov 2015 17:22:23 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Nov 2015 17:22:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 10B5EC4579 for ; Thu, 19 Nov 2015 17:22:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.872 X-Spam-Level: *** X-Spam-Status: No, score=3.872 tagged_above=-999 required=6.31 tests=[FSL_HELO_BARE_IP_2=0.873, HTML_MESSAGE=3, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 7ZMRl3jffVc5 for ; Thu, 19 Nov 2015 17:22:21 +0000 (UTC) Received: from relayvx12c.securemail.intermedia.net (relayvx12c.securemail.intermedia.net [64.78.52.187]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 5A987441C2 for ; Thu, 19 Nov 2015 17:22:21 +0000 (UTC) Received: from securemail.intermedia.net (localhost [127.0.0.1]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id 726BF53E6C for ; Thu, 19 Nov 2015 09:22:14 -0800 (PST) Subject: Re: Yarn application reading from Data node using short-circuit. MIME-Version: 1.0 x-echoworx-msg-id: bd56dd0b-ebd3-42e1-be08-e71e62109cae x-echoworx-emg-received: Thu, 19 Nov 2015 09:22:14.418 -0800 x-echoworx-action: delivered Received: from 10.254.155.17 ([10.254.155.17]) by emg-ca-1-2 (JAMES SMTP Server 2.3.2) with SMTP ID 750 for ; Thu, 19 Nov 2015 09:22:14 -0800 (PST) Received: from MBX080-W4-CO-1.exch080.serverpod.net (unknown [10.224.117.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id 36E9D53E6C for ; Thu, 19 Nov 2015 09:22:14 -0800 (PST) Received: from MBX080-W4-CO-2.exch080.serverpod.net (10.224.117.102) by MBX080-W4-CO-1.exch080.serverpod.net (10.224.117.101) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Thu, 19 Nov 2015 09:22:13 -0800 Received: from MBX080-W4-CO-2.exch080.serverpod.net ([10.224.117.102]) by mbx080-w4-co-2.exch080.serverpod.net ([10.224.117.102]) with mapi id 15.00.1130.005; Thu, 19 Nov 2015 09:22:13 -0800 From: Chris Nauroth To: "user@hadoop.apache.org" Thread-Topic: Yarn application reading from Data node using short-circuit. Thread-Index: AQHRIo1j8XhgJ362RUCCytm7iVIFDZ6jh8YA Date: Thu, 19 Nov 2015 17:22:12 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [50.181.140.32] x-source-routing-agent: Processed Content-Type: multipart/alternative; boundary="_000_D273418832F84cnaurothhortonworkscom_" --_000_D273418832F84cnaurothhortonworkscom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello Sandeep, As long as you have enabled short-circuit read as per the documentation [1]= , I expect any Hadoop process will take advantage of it while reading a loc= al replica. However, short-circuit read will not completely eliminate TCP = connection activity to the DataNode. There will still be a TCP connection = from the client to the DataNode to perform a handshake and establish the Un= ix domain socket. This is a very small payload though compared to the tran= sfer of block data over the Unix domain socket. [1] http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Sh= ortCircuitLocalReads.html --Chris Nauroth From: sandeep das > Reply-To: "user@hadoop.apache.org" > Date: Wednesday, November 18, 2015 at 10:44 PM To: "user@hadoop.apache.org" > Subject: Yarn application reading from Data node using short-circuit. Hi, I was going through some benchmarking and realized that there are lots of T= CP connections are initiated while running my PIG jobs over YARN(MR2). Thes= e TCP connections are related to data node. Although short-circuit is enabl= ed in my data nodes but still a lot TCP connections are being created. I wanted to check that how can we enable YARN applicationMaster to read dat= a from Data node using short-circuits i.e. unix domain sockets. I believe t= hat will improve the performance of our jobs. Can someone please help to understand how can I make sure that MR2 jobs cre= ated by PIG scripts are reading data from Data node using short-circuit ins= tead of TCP connections? Regards, Sandeep --_000_D273418832F84cnaurothhortonworkscom_ Content-Type: text/html; charset="us-ascii" Content-ID: <2CB5CA597B98644AADD8FAAD74FA99BF@exch080.serverpod.net> Content-Transfer-Encoding: quoted-printable
Hello Sandeep,

As long as you have enabled short-circuit read as per the documentatio= n [1], I expect any Hadoop process will take advantage of it while reading = a local replica.  However, short-circuit read will not completely elim= inate TCP connection activity to the DataNode.  There will still be a TCP connection from the client to th= e DataNode to perform a handshake and establish the Unix domain socket. &nb= sp;This is a very small payload though compared to the transfer of block da= ta over the Unix domain socket.


From: sandeep das <yarnhadoop@gmail.com>
Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Date: Wednesday, November 18, 2015 = at 10:44 PM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: Yarn application reading f= rom Data node using short-circuit.

Hi,

I was going through some benchmarking and realized that there are lots of T= CP connections are initiated while running my PIG jobs over YARN(MR2). Thes= e TCP connections are related to data node. Although short-circuit is enabl= ed in my data nodes but still a lot TCP connections are being created.

I wanted to check that how can we enable YARN applicationMaster to read dat= a from Data node using short-circuits i.e. unix domain sockets. I believe t= hat will improve the performance of our jobs.


Can someone please help to understand how can I make sure that MR2 jobs cre= ated by PIG scripts are reading data from Data node using short-circuit ins= tead of TCP connections?


Regards,
Sandeep
--_000_D273418832F84cnaurothhortonworkscom_--