From user-return-1634-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Tue Sep 25 13:41:21 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37EBED40A for ; Tue, 25 Sep 2012 13:41:21 +0000 (UTC) Received: (qmail 55325 invoked by uid 500); 25 Sep 2012 13:41:16 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 55225 invoked by uid 500); 25 Sep 2012 13:41:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 55215 invoked by uid 99); 25 Sep 2012 13:41:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 13:41:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.51.167.101] (HELO gundel.cs.uni-kassel.de) (141.51.167.101) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 13:41:10 +0000 Received: from localhost (localhost [127.0.0.1]) by gundel.cs.uni-kassel.de (Postfix) with ESMTP id 5E180AC555 for ; Tue, 25 Sep 2012 15:40:46 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at gundel.cs.uni-kassel.de Received: from gundel.cs.uni-kassel.de ([127.0.0.1]) by localhost (gundel.cs.uni-kassel.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9e3HcUL1b7bm for ; Tue, 25 Sep 2012 15:40:44 +0200 (CEST) Received: from [192.168.1.51] (kde-pool13.cs.uni-kassel.de [141.51.167.146]) by gundel.cs.uni-kassel.de (Postfix) with ESMTPSA id 4FB3FAC5FA for ; Tue, 25 Sep 2012 15:40:40 +0200 (CEST) From: =?iso-8859-1?Q?Bj=F6rn-Elmar_Macek?= Content-Type: multipart/alternative; boundary="Apple-Mail=_82078BF7-F2A7-4EE0-A4DF-81B4DDC70E98" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Subject: Re: Join-package combiner number of input and output records the same Date: Tue, 25 Sep 2012 15:40:44 +0200 References: <898D812C-862D-48BB-A261-27A8549E5A43@cs.uni-kassel.de> To: user@hadoop.apache.org In-Reply-To: <898D812C-862D-48BB-A261-27A8549E5A43@cs.uni-kassel.de> X-Mailer: Apple Mail (2.1498) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_82078BF7-F2A7-4EE0-A4DF-81B4DDC70E98 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Ups, sorry. You are using standart implementations? I dont know whats = happening then. Sorry. But the fact, that your inputsize equals your = outputsize in a "join" process reminded me too much of my own problems. = Sorry for confusion, i may have caused. Best, Am 25.09.2012 um 15:32 schrieb Bj=F6rn-Elmar Macek = : > Hi, >=20 > i had this problem once too. Did you properly overwrite the reduce = method with the @override annotation? > Does your reduce method use OutputCollector or Context for gathering = outputs? If you are using current version, it has to be Context. >=20 > The thing is: if you do NOT override the standart reduce function = (identity) is used and this results ofc in the same number of tuples as = you read as input. >=20 > Good luck! > Elmar >=20 > Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann = : >=20 >> I think I have tracked down the problem to the point that each split = only contains one big key-value pair and a combiner is connected to a = map task. Please correct me if I'm wrong, but I assume each map task = takes one split and the combiner operates only on the key-value pairs = within one split. That's why the combiner has no effect in my case. Is = there a way to combine the mapper outputs of multiple splits before they = are sent off to the reducer? >>=20 >> 2012/9/25 Sigurd Spieckermann >> Maybe one more note: the combiner and the reducer class are the same = and in the reduce-phase the values get aggregated correctly. Why is this = not happening in the combiner-phase? >>=20 >>=20 >> 2012/9/25 Sigurd Spieckermann >> Hi guys, >>=20 >> I'm experiencing a strange behavior when I use the Hadoop = join-package. After running a job the result statistics show that my = combiner has an input of 100 records and an output of 100 records. =46rom = the task I'm running and the way it's implemented, I know that each key = appears multiple times and the values should be combinable before = getting passed to the reducer. I'm running my tests in = pseudo-distributed mode with one or two map tasks. =46rom using the = debugger, I noticed that each key-value pair is processed by a combiner = individually so there's actually no list passed into the combiner that = it could aggregate. Can anyone think of a reason that causes this = undesired behavior? >>=20 >> Thanks >> Sigurd >>=20 >>=20 >=20 --Apple-Mail=_82078BF7-F2A7-4EE0-A4DF-81B4DDC70E98 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Ups, = sorry. You are using standart implementations? I dont know whats = happening then. Sorry. But the fact, that your inputsize equals your = outputsize in a "join" process reminded me too much of my own problems. = Sorry for confusion, i may have = caused.

Best,
Am 25.09.2012 um 15:32 = schrieb Bj=F6rn-Elmar Macek <macek@cs.uni-kassel.de>:
sigurd.spieckermann@gmail.co= m>:

I think I have tracked down the problem to the point that = each split only contains one big key-value pair and a combiner is = connected to a map task. Please correct me if I'm wrong, but I assume = each map task takes one split and the combiner operates only on the = key-value pairs within one split. That's why the combiner has no effect = in my case. Is there a way to combine the mapper outputs of multiple = splits before they are sent off to the reducer?

2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
Maybe one more note: the combiner and the reducer class are the same and = in the reduce-phase the values get aggregated correctly. Why is this not = happening in the combiner-phase?


2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
Hi guys,

I'm = experiencing a strange behavior when I use the Hadoop join-package. = After running a job the result statistics show that my combiner has an = input of 100 records and an output of 100 records. =46rom the task I'm = running and the way it's implemented, I know that each key appears = multiple times and the values should be combinable before getting passed = to the reducer. I'm running my tests in pseudo-distributed mode with one = or two map tasks. =46rom using the debugger, I noticed that each = key-value pair is processed by a combiner individually so there's = actually no list passed into the combiner that it could aggregate. Can = anyone think of a reason that causes this undesired behavior?

Thanks
Sigurd


=


= --Apple-Mail=_82078BF7-F2A7-4EE0-A4DF-81B4DDC70E98--