Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F165DDEF3 for ; Tue, 25 Sep 2012 09:58:31 +0000 (UTC) Received: (qmail 17687 invoked by uid 500); 25 Sep 2012 09:58:27 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 17296 invoked by uid 500); 25 Sep 2012 09:58:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17273 invoked by uid 99); 25 Sep 2012 09:58:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 09:58:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sigurd.spieckermann@gmail.com designates 209.85.220.48 as permitted sender) Received: from [209.85.220.48] (HELO mail-pa0-f48.google.com) (209.85.220.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 09:58:17 +0000 Received: by pabkp12 with SMTP id kp12so2724929pab.35 for ; Tue, 25 Sep 2012 02:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vJBaAJ0lGsD+uYaOeUy55ruzAEjoP0wsRusvLkWC/sQ=; b=vCUHxSBPCvMPrgKec/k06waLxvt+5eSrQGo5pl4BDNkS+x2s5YhvjqAZkOiWDv+aS9 AFo+0eWivXgxTOBjCD9S+Aab860zbE68hl/lz0Blbb1Y2V1C4atlFRPFkY0VlZshNVyt nd1TJJmRLmfPUkKr40TtrqmRa+B4A4FOnVVeylNx8VgWH+a5m+tDc12bODUy+5uE7923 uBKloiUcwHx3jNUnB+dSCkSwO/3jhSgI82L0K5RAFAhQmkytC3UsZTEmRSNSV1c5lq7J WAqIb7upaeyamt7Xkg6+NXdJnIy8+dZnFBlaLcJUfk6b3qnE7GWgGY6Vn0xBqrDFzL5I Ugow== MIME-Version: 1.0 Received: by 10.68.189.70 with SMTP id gg6mr44876261pbc.125.1348567075805; Tue, 25 Sep 2012 02:57:55 -0700 (PDT) Received: by 10.68.30.74 with HTTP; Tue, 25 Sep 2012 02:57:55 -0700 (PDT) In-Reply-To: References: Date: Tue, 25 Sep 2012 11:57:55 +0200 Message-ID: Subject: Re: Join-package combiner number of input and output records the same From: Sigurd Spieckermann To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8ff1c03e82598404ca83bde4 --e89a8ff1c03e82598404ca83bde4 Content-Type: text/plain; charset=ISO-8859-1 I think I have tracked down the problem to the point that each split only contains one big key-value pair and a combiner is connected to a map task. Please correct me if I'm wrong, but I assume each map task takes one split and the combiner operates only on the key-value pairs within one split. That's why the combiner has no effect in my case. Is there a way to combine the mapper outputs of multiple splits before they are sent off to the reducer? 2012/9/25 Sigurd Spieckermann > Maybe one more note: the combiner and the reducer class are the same and > in the reduce-phase the values get aggregated correctly. Why is this not > happening in the combiner-phase? > > > 2012/9/25 Sigurd Spieckermann > >> Hi guys, >> >> I'm experiencing a strange behavior when I use the Hadoop join-package. >> After running a job the result statistics show that my combiner has an >> input of 100 records and an output of 100 records. From the task I'm >> running and the way it's implemented, I know that each key appears multiple >> times and the values should be combinable before getting passed to the >> reducer. I'm running my tests in pseudo-distributed mode with one or two >> map tasks. From using the debugger, I noticed that each key-value pair is >> processed by a combiner individually so there's actually no list passed >> into the combiner that it could aggregate. Can anyone think of a reason >> that causes this undesired behavior? >> >> Thanks >> Sigurd >> > > --e89a8ff1c03e82598404ca83bde4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think I have tracked down the problem to the point that each split only c= ontains one big key-value pair and a combiner is connected to a map task. P= lease correct me if I'm wrong, but I assume each map task takes one spl= it and the combiner operates only on the key-value pairs within one split. = That's why the combiner has no effect in my case. Is there a way to com= bine the mapper outputs of multiple splits before they are sent off to the = reducer?

2012/9/25 Sigurd Spieckermann <= sigurd.spieckermann@gmail.com>
Maybe one more note: the combiner and the reducer class are the same and in= the reduce-phase the values get aggregated correctly. Why is this not happ= ening in the combiner-phase?

2012/9/25 Sigurd Spieckermann = <sigu= rd.spieckermann@gmail.com>
Hi guys,

I'm experiencing a stran= ge behavior when I use the Hadoop join-package. After running a job the res= ult statistics show that my combiner has an input of 100 records and an out= put of 100 records. From the task I'm running and the way it's impl= emented, I know that each key appears multiple times and the values should = be combinable before getting passed to the reducer. I'm running my test= s in pseudo-distributed mode with one or two map tasks. From using the debu= gger, I noticed that each key-value pair is processed by a combiner individ= ually so there's actually no list passed into the combiner that it coul= d aggregate. Can anyone think of a reason that causes this undesired behavi= or?

Thanks
Sigurd


--e89a8ff1c03e82598404ca83bde4--