Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5ED6D8CE for ; Tue, 4 Sep 2012 11:34:06 +0000 (UTC) Received: (qmail 19192 invoked by uid 500); 4 Sep 2012 11:34:01 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 18983 invoked by uid 500); 4 Sep 2012 11:34:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18967 invoked by uid 99); 4 Sep 2012 11:34:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2012 11:34:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2012 11:33:53 +0000 Received: by vbme21 with SMTP id e21so7914511vbm.35 for ; Tue, 04 Sep 2012 04:33:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=kLhU5+34oT9ZqIrAXfNEypCImo81Px35Wq9mU0Ysl2I=; b=Fr0gf+v0RSF8LB0+eSRNFjnVGE5/di1KBoE9DDNNsHSFHrVASQZMNBbixDXPPc3nsj 7Zp+af6ih3wSTAFrYo+6RThuttc/3fvGzAjAdEoJuAITYCLalUMO/oA/bwI/Ed/zQmut eGjUFWMeIX8CMazwa3RimG17eco9bPGRKpgQIbJuYICdttM0tV5ke5QS9xEH5yRfns9a u10ZQbEdwyl0dVnbKSOguVTX/WekCTSm404SjYH37nqGhIPF66lwQ4f5WBGqt+45qrta tBNXMtyJfcKteQW1jL/elTOZrKk3OY6SGMe/ppk/gmYVKqnhZDQEwbXzAtglBo5k63Gm Nh7A== MIME-Version: 1.0 Received: by 10.58.12.7 with SMTP id u7mr15538580veb.8.1346758411155; Tue, 04 Sep 2012 04:33:31 -0700 (PDT) Received: by 10.52.64.235 with HTTP; Tue, 4 Sep 2012 04:33:31 -0700 (PDT) In-Reply-To: References: Date: Tue, 4 Sep 2012 12:33:31 +0100 Message-ID: Subject: Re: knowing the nodes on which reduce tasks will run From: Steve Loughran To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b41b8dcb1f71b04c8dea0ed X-Gm-Message-State: ALoCoQnGYcW8kolG7ADAaAFNck40eO81dP+HHh/tDB93Y6bKYeTvZkMzLSvjIgbrkagX9hyeKDlU --047d7b41b8dcb1f71b04c8dea0ed Content-Type: text/plain; charset=UTF-8 On 3 September 2012 15:19, Abhay Ratnaparkhi wrote: > Hello, > > How can one get to know the nodes on which reduce tasks will run? > > One of my job is running and it's completing all the map tasks. > My map tasks write lots of intermediate data. The intermediate directory > is getting full on all the nodes. > If the reduce task take any node from cluster then It'll try to copy the > data to same disk and it'll eventually fail due to Disk space related > exceptions. > > you could always set up specific partitions for intermediate data, though you get better bandwidth by striping the data across all disks, and better flexibility by sharing the same partition. There's also a property to set the amount of space to allocate for DFS storage; reduce that by changing dfs.datanode.du.reserved and the datanodes will leave more free space around. see: http://wiki.apache.org/hadoop/DiskSetup --047d7b41b8dcb1f71b04c8dea0ed Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On 3 September 2012 15:19, Abhay Ratnapa= rkhi <abhay.ratnaparkhi@gmail.com> wrote:
Hello,

How can one get to know the nodes on which reduce= tasks will run?

One of my job is running and it&#= 39;s completing all the map tasks.
My map tasks write lots of int= ermediate data. The intermediate directory is getting full on all the nodes= .=C2=A0
If the reduce task take any node from cluster then It'll try to co= py the data to same disk and it'll eventually fail due to Disk space re= lated exceptions.


you could always set up specific partitions for intermediate data, though y= ou get better bandwidth by striping the data across all disks, and better f= lexibility by sharing the same partition.

There= 9;s also a property to set the amount of space to allocate for DFS storage;= reduce that by changing=C2=A0=C2=A0dfs.datanode.du.reserved and the datano= des will leave more free space around.

--047d7b41b8dcb1f71b04c8dea0ed--