Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCB7BCA4B for ; Wed, 6 Jun 2012 16:25:38 +0000 (UTC) Received: (qmail 38173 invoked by uid 500); 6 Jun 2012 16:25:33 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 38081 invoked by uid 500); 6 Jun 2012 16:25:33 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 38025 invoked by uid 99); 6 Jun 2012 16:25:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 16:25:33 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [143.182.124.21] (HELO mga03.intel.com) (143.182.124.21) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 16:25:29 +0000 Received: from azsmga001.ch.intel.com ([10.2.17.19]) by azsmga101.ch.intel.com with ESMTP; 06 Jun 2012 09:25:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.71,315,1320652800"; d="scan'208";a="152416142" Received: from orsmsx606.amr.corp.intel.com ([10.22.226.128]) by azsmga001.ch.intel.com with ESMTP; 06 Jun 2012 09:24:41 -0700 Received: from orsmsx153.amr.corp.intel.com (10.22.226.247) by orsmsx606.amr.corp.intel.com (10.22.226.128) with Microsoft SMTP Server (TLS) id 8.2.255.0; Wed, 6 Jun 2012 09:24:40 -0700 Received: from orsmsx103.amr.corp.intel.com ([169.254.2.63]) by ORSMSX153.amr.corp.intel.com ([169.254.13.98]) with mapi id 14.01.0355.002; Wed, 6 Jun 2012 09:24:40 -0700 From: "Barry, Sean F" To: "common-user@hadoop.apache.org" Subject: RE: Shuffle/sort Thread-Topic: Shuffle/sort Thread-Index: Ac1DbQBTjDGM6kR2RcC27bYOG9lz2wAW9LoAAA37vAA= Date: Wed, 6 Jun 2012 16:24:39 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.138] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thanks Harsh! And is this the right source code for the shuffling that is done in the red= uce task? http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-mapreduc= e-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapre= duce/task/reduce/Shuffle.java%7C%7Cshuffle+sort -sb -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com]=20 Sent: Tuesday, June 05, 2012 7:43 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle/sort Hey Sean, Check out http://www.slideshare.net/jhammerb/hadoop-map-reduce-arch-106883, a slightly dated and MR1-oriented presentation from Owen O'Malley that goes= a good level in-depth to get an overview of how things work (including how= reduces pull data). After that, check out Chris Douglas' http://www.slideshare.net/hadoopusergroup/ordered-record-collection that goes in-depth into the evolution of the implementations of that layer.= This is pretty much the state of 0.20/1.0 today too, and in 2.0 we have ha= d Netty replacing Jetty among other improvements but I haven't a public doc= ument link to share on this yet. Others may share the changes docs on 2.0 i= f they have a link to one (or I'll respond back as soon as I have one). I hope this helps! On Wed, Jun 6, 2012 at 4:16 AM, Barry, Sean F wrot= e: > "I was always wondering after mapping, how each reduce task get its=20 > input. It is said in google's paper and hadoop's documentation that a=20 > sort is done to aggregate the same key of the map output. But there is=20 > no detailed explanation of how it is implemented and my intuition is=20 > that perhaps a global hashing will work better than sorting. So I=20 > really want to know the details and see whether my intuition is right. If= I can find out that in the source code, where should I start with?" > > I saw this question online and no one replied to it. does anyone know whe= re I go to study the source code for the shuffle and sort. > > -sean -- Harsh J