Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDFDA18748 for ; Sun, 4 Oct 2015 08:39:26 +0000 (UTC) Received: (qmail 1975 invoked by uid 500); 4 Oct 2015 08:39:23 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 1832 invoked by uid 500); 4 Oct 2015 08:39:23 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1813 invoked by uid 99); 4 Oct 2015 08:39:22 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Oct 2015 08:39:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D4449C22D3 for ; Sun, 4 Oct 2015 08:39:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id kBE6QtGk2Tgf for ; Sun, 4 Oct 2015 08:39:13 +0000 (UTC) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 80DFB439D6 for ; Sun, 4 Oct 2015 08:39:13 +0000 (UTC) Received: by wicgb1 with SMTP id gb1so81384747wic.1 for ; Sun, 04 Oct 2015 01:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type; bh=gW/Tpn+NZsOYaFnIVIK9cX3N3TO6vjO7PGdsnZ+Tstw=; b=q4OHSG/AeqF0AJll5BfkBEbbt+xUdItsOIdJe7hFqxWdJvyq3I/LG9mSSDeNU87lbC oGiRqDQWsR289Xy+Pg+slwti25i/HHQ5PXWGxWGUJiXXUsoO46P963Gtk5anN0UvHqRb rlQO1b2nXfIzCBSg9rGwr8bWhP4dQjlQuZKz/EV5Zv3rvkKAofpck1B50Y/4W3ZYhwj0 z1o/CzXasCDC/s8MfhG9wrOa36jKwdQm9othH+fwgaFGSotIgjsXX4rxoRYoDcQiFjoW 9CiJDTVeytGAxrncqJnzbdJhxPl1RULH7K/KtsEbeRsAAEjMw/OUhuwd3SxBiqsxZWzO e6hQ== X-Received: by 10.194.6.106 with SMTP id z10mr5568635wjz.104.1443947952681; Sun, 04 Oct 2015 01:39:12 -0700 (PDT) Received: from [192.168.132.198] ([150.214.118.80]) by smtp.gmail.com with ESMTPSA id fz1sm8184260wic.8.2015.10.04.01.39.11 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 04 Oct 2015 01:39:11 -0700 (PDT) Message-ID: <5610E593.9090502@gmail.com> Date: Sun, 04 Oct 2015 10:38:43 +0200 From: paco User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.8.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Combiner and KeyComposite Content-Type: multipart/alternative; boundary="------------000504070302080505030706" This is a multi-part message in MIME format. --------------000504070302080505030706 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ I have the exact same code, but now I am trying to improve performance so I have decided to add a combiner. I have added two modifications: Main file: |job.setCombinerClass(CombinerK.class); | Combiner file: |public class CombinerK extends Reducer { public void reduce(KeyWritable key, Iterator values, Context context) throws IOException, InterruptedException { Iterator it = values; System.err.println("combiner " + key); KeyWritable first_value = it.next(); System.err.println("va: " + first_value); while (it.hasNext()) { sum += it.next().getSs(); } first_value.setS(sum); context.write(key, first_value); } } | But it seems that it is not run because I can't find any logs file which have the word "combiner". When I saw counters after running, I could see: | Combine input records=4040000 Combine output records=4040000 | The combiner seems like it is being executed but it seems as it has been receiving a call for each key and by this reason it has the same number in input as output. --------------000504070302080505030706 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit

I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

I have the exact same code, but now I am trying to improve performance so I have decided to add a combiner. I have added two modifications:

Main file:

job.setCombinerClass(CombinerK.class);

Combiner file:

public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {

    public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {


        Iterator<KeyWritable> it = values;

        System.err.println("combiner " + key);

        KeyWritable first_value = it.next();
        System.err.println("va: " + first_value);

        while (it.hasNext()) {

            sum += it.next().getSs();

        }
        first_value.setS(sum);
        context.write(key, first_value);


    }
}

But it seems that it is not run because I can't find any logs file which have the word "combiner". When I saw counters after running, I could see:

    Combine input records=4040000
    Combine output records=4040000

The combiner seems like it is being executed but it seems as it has been receiving a call for each key and by this reason it has the same number in input as output.

--------------000504070302080505030706--