Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC1D3D87C for ; Mon, 1 Oct 2012 18:13:49 +0000 (UTC) Received: (qmail 7248 invoked by uid 500); 1 Oct 2012 18:13:45 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 7116 invoked by uid 500); 1 Oct 2012 18:13:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7105 invoked by uid 99); 1 Oct 2012 18:13:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 18:13:45 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 18:13:39 +0000 Received: by obhx4 with SMTP id x4so6980949obh.35 for ; Mon, 01 Oct 2012 11:13:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=EfbGjsi+ZC3RoM/7n6nQ2suNpv1gaIGJBK/raXI8WtM=; b=dmJayOtMaC/a22lygcKCGsnzgF9G5SlOvq9ntZ/olzalQUrTLcKrl9vsgxOIuoiHdx LCpP3NBD2Hd22/SQwdS1DXssD1RtiRvJgF3hvwEQBHz179lvyn0lvMUnvBpx7QxJaOZc BScwzGKkuF1HpXiE0rW+SivIFqSzxSVEO1SCjuX+SwZqw/IOjSYyMgtfvndTH13mYmds BU6kn3x/VjzSH1NDyf33rMwJgAeCySGBUapprWJkWIQr/ntKQ6o+ATBrlPKSjUBqCj8B u3HcMEizqsXvwcpTE1yyH0BEfrM5MF3yQ3WzNmjXPEpNn9yk2Bqu6HqSMiF1V+c3hscP moyQ== Received: by 10.60.31.170 with SMTP id b10mr12218407oei.107.1349115198885; Mon, 01 Oct 2012 11:13:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.11.168 with HTTP; Mon, 1 Oct 2012 11:12:58 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Mon, 1 Oct 2012 23:42:58 +0530 Message-ID: Subject: Re: Reduce Copy Speed To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnc8PF4bF5CXP0p7IJ5YxDe9+brXporbWXM2D/UN/H+aFizh9SSBPdWKSeHeQSob65RJMhj X-Virus-Checked: Checked by ClamAV on apache.org Hi Brandon, On Mon, Oct 1, 2012 at 11:23 PM, Brandon wrote: > What speed do people typically see for the copy during a reduce? It varies due to a few factors. But there's highly improved netty-based transfers in Hadoop 2.x that you can use for even faster, and more reliable transfers. > From tasktracker here is an average on: > reduce > copy (500 of 504 at 1.52 MB/s) > > > We have seen it range from .5 to 4 MB/s. > That seems a bit slow. Slow compared to what exactly? How many concurrent reducers fetch at the same time from a single machine? And what is your slowstart threshold, which would dictate that the reducers wait for many more maps than 5% to finish before beginning to pull data from other tasktrackers - leading to continuous large transfers rather than small, frequent transfers of a few tasks at a time - saving resources and improving speed. > Does anyone else have other benchmark numbers to share? > -- Harsh J