Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 927EE9A08 for ; Fri, 24 May 2013 06:44:10 +0000 (UTC) Received: (qmail 22520 invoked by uid 500); 24 May 2013 06:44:05 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 22085 invoked by uid 500); 24 May 2013 06:44:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 22036 invoked by uid 99); 24 May 2013 06:44:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 May 2013 06:44:02 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 May 2013 06:43:58 +0000 Received: by mail-ie0-f177.google.com with SMTP id 9so11272886iec.36 for ; Thu, 23 May 2013 23:43:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=DbcP4CHnn+Gdc7Gi6pOR8r9FjN0VD9WK7a1TeRgnC1w=; b=Jo9dI+XSdJx3fDB4FsNTJgW04k0K6dWUci+YoLiudl5lb5OxFvGqWn9TqpjjE1sgfX dDZuPirFQ64hOReFprISi/UOcuAvYUgo5qMIVK/Isz7RFbIEsGp+d3in4/IYNfLx0AKV 4gtIphk4W9DPjSmkglKzLNSc5o6M1kGNOit1NFIGIKHdPuWf+9FNbTZDPqgt5sizwvNr H5cJ30yajMHIgEi+NEQof2EfmZCBu2CIRYR7CSagz0s3nlpwGbYst2tsdCHhHjSK8FQc OBRIDqIKvddojLNjrxybZEBHRrS0JXdHUHllAeSyClc7Jz4+ISYFBtegVkzxLODFscGa 1BOw== X-Received: by 10.42.27.208 with SMTP id k16mr12245911icc.43.1369377817940; Thu, 23 May 2013 23:43:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.33.39 with HTTP; Thu, 23 May 2013 23:43:17 -0700 (PDT) In-Reply-To: <869970D71E26D7498BDAC4E1CA92226B6589F561@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B6589F561@MBX021-E3-NJ-2.exch021.domain.local> From: Harsh J Date: Fri, 24 May 2013 12:13:17 +0530 Message-ID: Subject: Re: HTTP file server, map output, and other files To: "" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQm+cyLObmTKGDm1GRSfF5bRHOUlIv2PTTGr8T7rFt1/ym2YAV9/nlZPcpC8u8pF+yAGh+ZC X-Virus-Checked: Checked by ClamAV on apache.org YARN has a ShuffleHandler plugin used for MR purposes, but the APIs used here aren't "general"/public so you'd have to build your own utilities to do that. Its not too difficult to achieve but a general API would certainly be nice. Tez (Incubating) aims to solve some of this for users writing YARN apps in a general way, but it isn't consumable yet. You can follow Tez on the Apache Incubator at http://incubator.apache.org/projects/tez.html. P.s. As mentioned, YARN-based MR2 does not use HTTP (Jetty) anymore. It uses Netty. On Fri, May 24, 2013 at 3:14 AM, John Lilley wro= te: > Thanks to previous kind answers and more reading in the elephant book, I = now > understand that mapper tasks place partitioned results into local files t= hat > are served up to reducers via HTTP: > > > > =93The output file=92s partitions are made available to the reducers over= HTTP. > The maximum number of worker threads used to serve the file partitions is > controlled by the tasktracker.http.threads property; this setting is per > tasktracker, not per map task slot. The default of 40 may need to be > increased for large clusters running large jobs. In MapReduce 2, this > property is not applicable because the maximum number of threads used is = set > automatically based on the number of processors on the machine. (MapReduc= e 2 > uses Netty, which by default allows up to twice as many threads as there = are > processors.)=94 > > > > My question is, for a custom (non-MR) application under YARN, how would I > set up my application tasks=92 output data to be served over HTTP? Is th= ere > an API to control this, or are there predefined local folders that will b= e > served up? Once I am finished with the temporary data, how do I request > that the files are removed? > > > > Thanks > > John > > -- Harsh J