Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 50866 invoked from network); 9 Nov 2010 20:19:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 20:19:27 -0000 Received: (qmail 36260 invoked by uid 500); 9 Nov 2010 20:19:51 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 36200 invoked by uid 500); 9 Nov 2010 20:19:50 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 36191 invoked by uid 99); 9 Nov 2010 20:19:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 20:19:50 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of awittenauer@linkedin.com designates 69.28.149.24 as permitted sender) Received: from [69.28.149.24] (HELO esv4-mav02.corp.linkedin.com) (69.28.149.24) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 20:19:45 +0000 DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=X-IronPort-AV:Received:From:To:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:x-originating-ip:Content-Type: Content-ID:Content-Transfer-Encoding:MIME-Version; b=oRYK7Md073ThLALRfD01zPUOvOGNch17JlihtfvX0QiTNgurZ144o0l9 sK5pcvd2t0ueB4pt2nnu5RjbRQF5RPOEeT/TGjD+Y9ftEvPKWV1C0p4SN /PP632JrbMzOK5+; DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=linkedin.com; i=awittenauer@linkedin.com; q=dns/txt; s=proddkim; t=1289333985; x=1320869985; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Allen=20Wittenauer=20 |Subject:=20Re:=20Hadoop=20partitions=20Problem|Date:=20T ue,=209=20Nov=202010=2020:19:24=20+0000|Message-ID:=20|To:=20"< common-user@hadoop.apache.org>"=20|MIME-Version:=201.0|Content-Transfer-Encoding: =20quoted-printable|Content-ID:=20<36FCFC901C0F544F911C4A 1E3B2D702B@linkedin.com>|In-Reply-To:=20<4CD9606E.4070700 @apache.org>|References:=20=0D=0A=20<0741EC6E-8220-41A9-9203-5377 D337EDB4@linkedin.com>=0D=0A=20<4CD9606E.4070700@apache.o rg>; bh=0oqb6lPF2iRPurd0CX2ALbDnkbM9OYsUwxdZo3gFEh8=; b=OhQG1DgjW94Sm1GZrT4bl2OnHJuTM+KuReqxv4wfWkTQVblhR4wChKOY MONssu4nYLmhktfJqGL7woZRGioLD55COziRZARBslu2NcJn/JuPyKudd d/DkVsAg/W7K6if; X-IronPort-AV: E=Sophos;i="4.59,175,1288594800"; d="scan'208";a="17049785" Received: from ESV4-EXC01.linkedin.biz ([fe80::d7c:dc04:aea1:97d7]) by esv4-cas01.linkedin.biz ([172.18.46.140]) with mapi id 14.01.0218.012; Tue, 9 Nov 2010 12:19:25 -0800 From: Allen Wittenauer To: "" Subject: Re: Hadoop partitions Problem Thread-Topic: Hadoop partitions Problem Thread-Index: Act/Uy6JPCCn1hGonku/rU9xRaT6YgASAmuAADFt6wAAC2EtAA== Date: Tue, 9 Nov 2010 20:19:24 +0000 Message-ID: References: <0741EC6E-8220-41A9-9203-5377D337EDB4@linkedin.com> <4CD9606E.4070700@apache.org> In-Reply-To: <4CD9606E.4070700@apache.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.46.247] Content-Type: text/plain; charset="us-ascii" Content-ID: <36FCFC901C0F544F911C4A1E3B2D702B@linkedin.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 On Nov 9, 2010, at 6:53 AM, Steve Loughran wrote: > You can get unbalanced disks even without swapping if you are using the s= ame set of disks for mapred temp/overspill storage. This gives you good ban= dwidth, but can lead to unbalanced systems, as can deletion of large files. This is actually the reason I recommend people create separate file system= s for mapred. It is the only way to keep MR 'contained' to the point it do= esn't destroy a grid. [Plus it makes it dirt simple to clean up the MR dir= ectories every so often since Hadoop is pretty bad at it.]=