Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD9AC10E81 for ; Wed, 14 Aug 2013 05:29:31 +0000 (UTC) Received: (qmail 66608 invoked by uid 500); 14 Aug 2013 05:29:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 66291 invoked by uid 500); 14 Aug 2013 05:29:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 66283 invoked by uid 99); 14 Aug 2013 05:29:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Aug 2013 05:29:25 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.223.178 as permitted sender) Received: from [209.85.223.178] (HELO mail-ie0-f178.google.com) (209.85.223.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Aug 2013 05:29:20 +0000 Received: by mail-ie0-f178.google.com with SMTP id f4so11668408iea.23 for ; Tue, 13 Aug 2013 22:28:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=1dO6ewZatsQnRsxafVf1n3TXROfqoJkEpUcaXuoICY8=; b=a4Lt+UfCeWjq+3oVu8oFLsjRVIdaYG/xG7n3EwGKL/PnHN5YW7TXTxncS9W0j+lOZc oiJ6sgZg4b0yklGyMIzKnxGWoqQqf9D5eT40i2oPx6mm8cb0ksACZiJifaA8zhGzKXg8 ZSuxPasmtWjoKzzSG273JJSqXF1mfdemGx4VEs2aFDiIrzCYUD2sbqp5FrtmiGT2a7OV 8FiRy1Y1zJgMa1p7m969+cHQvExtKjaLZqq+gWhKrGe7LSjWJoK1UNitKq0/bm1YeBG6 RMxWRXnuKH+/LNa7a9+X4t3L6ILZjXb0A0uU/Of1dOnU249UScfh1pWnQIIU4vv97//4 YshA== X-Gm-Message-State: ALoCoQlZpU3ErNpqEDGXwlqlpP9evL8pBfgcSEEFD9wNf3YJ+o8P+HX+oFUQtle06PIdbKmDaftt X-Received: by 10.50.120.6 with SMTP id ky6mr4858363igb.58.1376458139460; Tue, 13 Aug 2013 22:28:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.95.199 with HTTP; Tue, 13 Aug 2013 22:28:39 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Wed, 14 Aug 2013 10:58:39 +0530 Message-ID: Subject: Re: Reduce Task Clarification To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Are you looking to do a secondary sort under a grouped key? A reduce() is called once for each globally unique map() emitted key, along with all values grouped for it. To sort the grouped data, you need to use a separate sort comparator and perform the 'secondary sort'. On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett wrote: > I am working on a MapReduce job where I would like to have the output sorted > by a LongWritable value. I read the Anatomy of a MapReduce Run in the > Definitive Guide and it didn't say explicitly whether reduce() gets called > only once per map output key. If it does get called only once I was thinking > that I could use this: > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class) > to do the sorting. > > Thank you for your time. > > -- > Sam Garrett > ActionX, NYC -- Harsh J