Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 3620 invoked from network); 22 Dec 2010 21:09:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Dec 2010 21:09:15 -0000 Received: (qmail 39502 invoked by uid 500); 22 Dec 2010 21:09:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 39466 invoked by uid 500); 22 Dec 2010 21:09:15 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 39458 invoked by uid 99); 22 Dec 2010 21:09:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Dec 2010 21:09:15 +0000 X-ASF-Spam-Status: No, hits=3.3 required=10.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.172] (HELO mrout2.yahoo.com) (216.145.54.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Dec 2010 21:09:08 +0000 Received: from EGL-EX07CAS01.ds.corp.yahoo.com (egl-ex07cas01.eglbp.corp.yahoo.com [203.83.248.208]) by mrout2.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id oBML7rLq088762 for ; Wed, 22 Dec 2010 13:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1293052074; bh=z0EYovJ7JvnBWqxsExFuaxzb01tdFpxhmEwhRvkAlRs=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=q1s9EjTYXqu4TvKxfv9qib+tnItOz4MylEwhB+abKT2XjHrOnGvxhKuVMzJ3IJEl5 l6WXUhNi3qLs90pB4Cjum0j1kbhrKVjBMKAC7wb9AQ7LLqPSMN5aofLPT2HUfXST5q KZ7Dches5ZchB2NkX6WosYQYgCYUo6Tr+3XHcyyw= Received: from EGL-EX07VS01.ds.corp.yahoo.com ([203.83.248.205]) by EGL-EX07CAS01.ds.corp.yahoo.com ([203.83.248.215]) with mapi; Thu, 23 Dec 2010 02:37:53 +0530 From: "Ravi Gummadi" To: "mapreduce-user@hadoop.apache.org" Date: Thu, 23 Dec 2010 02:37:52 +0530 Subject: Re: Spill and Map Output Thread-Topic: Spill and Map Output Thread-Index: AcuiGZZ/gRWTlXKaRIqgZ4r00AC9qgAA/xC/ Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C93868A5238F5graviyahooinccom_" MIME-Version: 1.0 --_000_C93868A5238F5graviyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Each map task will generate a single intermediate file (i.e. Map output fil= e). This is obtained by merging multiple spills, if spills needed to happen= . Index file gives the details of the offset and length for each reducer. Off= set is offset in the map output file where the input data for the particula= r reducer starts and length is the size of the data starting from the offse= t. -Ravi On 12/23/10 2:17 AM, "Pedro Costa" wrote: Hi, 1 - I would like to understand how a partition works in the Map Reduce. I know that the Map Reduce contains the IndexRecord class that indicates the length of something. Is it the length of a partition or of a spill? 2 - In large map output, a partition can be a set of spills, or a spill is simple the same thing as a partition? Thanks, -- Pedro --_000_C93868A5238F5graviyahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Spill and Map Output Each map task will generate a single intermediate file (i.e. Map outp= ut file). This is obtained by merging multiple spills, if spills needed to = happen.

Index file gives the details of the offset and length for each reducer. Off= set is offset in the map output file where the input data for the particula= r reducer starts and length is the size of the data starting from the offse= t.

-Ravi


On 12/23/10 2:17 AM, "Pedro Costa" <psdc1978@gmail.com> wrote:

Hi,

1 - I would like to understand how a partition works in the Map
Reduce. I know that the Map Reduce contains the IndexRecord class that
indicates the length of something. Is it the length of a partition or
of a spill?

2 - In large map output, a partition can be a set of spills, or a
spill is simple the same thing as a partition?

Thanks,
--
Pedro

--_000_C93868A5238F5graviyahooinccom_--