Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45C9A109D2 for ; Sun, 20 Oct 2013 20:36:04 +0000 (UTC) Received: (qmail 12910 invoked by uid 500); 20 Oct 2013 20:35:49 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 12811 invoked by uid 500); 20 Oct 2013 20:35:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 12797 invoked by uid 99); 20 Oct 2013 20:35:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Oct 2013 20:35:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.219 as permitted sender) Received: from [206.225.164.219] (HELO hub021-nj-4.exch021.serverdata.net) (206.225.164.219) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Oct 2013 20:35:37 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-4.exch021.domain.local ([10.240.4.39]) with mapi id 14.03.0123.003; Sun, 20 Oct 2013 13:35:16 -0700 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: temporary file locations for YARN applications Thread-Topic: temporary file locations for YARN applications Thread-Index: Ac7NqAsLQ5nAMGAySLaqoYow4x3TCwARt48AAAb8ZSA= Date: Sun, 20 Oct 2013 20:35:15 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B86D8A3EF@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B86D8A0DB@MBX021-E3-NJ-2.exch021.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [173.160.43.60] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Harsh, thanks for the quick response. These files don't need to be on the = DFS (although we use that too). These are local files used during sorting,= joining, transitive closure. =20 The task-relative folder might be good enough, but our app *can* make use o= f multiple temp folders if they are available. Our YARN app can be fairly = I/O intensive; is it possible to allocate more than one temp folder on diff= erent physical devices? =20 Or perhaps YARN might help us. Will YARN assign tasks to CWD folders on dif= ferent disks so that they do not compete with each other on I/O? =20 For that matter, where does MR allocate the temporary files generated by Ma= pper output? Presumably MR has the same I/O parallelism requirements that = we do. Thanks John -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com]=20 Sent: Sunday, October 20, 2013 10:49 AM To: Subject: Re: temporary file locations for YARN applications Every container gets its own local work directory (You can use the relative= ./) thats auto-cleaned up at the end of the container's life. This is the best place to store the temporary files. This is not something = you need custom configuration for. Do the files need to be on a distributed FS or a local one? On Sun, Oct 20, 2013 at 8:54 PM, John Lilley wro= te: > We have a pure YARN application (no MapReduce) that has need to store=20 > a significant amount of temporary data. How can we know the best=20 > location for these files? How can we ensure that our YARN tasks have=20 > write access to these locations? Is this something that must be configur= ed outside of YARN? > Thanks, > John -- Harsh J