Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 65507 invoked from network); 8 Jan 2010 03:21:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jan 2010 03:21:52 -0000 Received: (qmail 1473 invoked by uid 500); 8 Jan 2010 03:21:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 1323 invoked by uid 500); 8 Jan 2010 03:21:49 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 1313 invoked by uid 99); 8 Jan 2010 03:21:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jan 2010 03:21:49 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.2.217.197] (HELO smtp02.srv.cs.cmu.edu) (128.2.217.197) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jan 2010 03:21:39 +0000 Received: from [128.2.176.151] (OTIS.LTI.CS.CMU.EDU [128.2.176.151]) by smtp02.srv.cs.cmu.edu (8.13.6/8.13.6) with ESMTP id o083LGVQ011010 for ; Thu, 7 Jan 2010 22:21:16 -0500 (EST) Message-ID: <4B46A4AC.2050207@cs.cmu.edu> Date: Thu, 07 Jan 2010 22:21:16 -0500 From: Le Zhao User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: "common-user@hadoop.apache.org" Subject: Will already sorted Mapper output improve speed of Sort in reducer? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: mimedefang-cmuscs on 128.2.217.197 X-Virus-Checked: Checked by ClamAV on apache.org Hi, Does anybody know whether sorted Mapper output will decrease the Sort in the reduce phase? I'm teaching a class, and am curious to know how much of a difference will sorted vs. unsorted mapper output be. If the merge sort is implemented to deal with already sorted input, then I guess it will be fast. Am I right? Thanks, Le