Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1061210883 for ; Sat, 31 Aug 2013 02:32:21 +0000 (UTC) Received: (qmail 1167 invoked by uid 500); 31 Aug 2013 02:32:13 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 1085 invoked by uid 500); 31 Aug 2013 02:32:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1077 invoked by uid 99); 31 Aug 2013 02:32:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Aug 2013 02:32:12 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chivas314159@gmail.com designates 209.85.220.46 as permitted sender) Received: from [209.85.220.46] (HELO mail-pa0-f46.google.com) (209.85.220.46) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Aug 2013 02:32:06 +0000 Received: by mail-pa0-f46.google.com with SMTP id fa1so3018062pad.19 for ; Fri, 30 Aug 2013 19:31:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=D887mwTMh411rKQiqd2yn1Gc+xtmluE0IL5XVPPN9i0=; b=sdJ6UoSVYnvTM1JwmxaeV/hpsv8tMWxKEtRWEMXbz26AZbRbrWkqlxLcF5jH44B8GZ vX+rSF57lGmQEgVqXqYPjfQclpfwIL/j8cAQajkOxkWZTpM090HHkwbsxgX7CZ7N9byC UvnVeSHh49U9yaHzqd0FWQKOHhxB0N4kIWN7Q6jXRtGuAZ0Mel+skcyA7izpW9RVsO8v DyLrXbGJuk/cUFxcNevpC3kkplstqKxwTq9qrKp+mWygleKiaBD7CY5YhPaZkviNbpam 88TqR3R6bzgdGuvLBsuc8/EFOPEJ5kR+6Y8Po2IaRmpA8sZP85mOkLuDqPNtP7V1a5Ei iErg== MIME-Version: 1.0 X-Received: by 10.68.135.101 with SMTP id pr5mr85824pbb.196.1377916273824; Fri, 30 Aug 2013 19:31:13 -0700 (PDT) Received: by 10.68.78.134 with HTTP; Fri, 30 Aug 2013 19:31:13 -0700 (PDT) In-Reply-To: References: Date: Sat, 31 Aug 2013 03:31:13 +0100 Message-ID: Subject: Re: Job config before read fields From: Adrian CAPDEFIER To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b10d1ff08276104e53522cc X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10d1ff08276104e53522cc Content-Type: text/plain; charset=ISO-8859-1 But how would the comparator have access to the job config? On Sat, Aug 31, 2013 at 2:38 AM, Shahab Yunus wrote: > I think you have to override/extend the Comparator to achieve that, > something like what is done in Secondary Sort? > > Regards, > Shahab > > > On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER wrote: > >> Howdy, >> >> I apologise for the lack of code in this message, but the code is fairly >> convoluted and it would obscure my problem. That being said, I can put >> together some sample code if really needed. >> >> I am trying to pass some metadata between the map & reduce steps. This >> metadata is read and generated in the map step and stored in the job >> config. It also needs to be recreated on the reduce node before the key/ >> value fields can be read in the readFields function. >> >> I had assumed that I would be able to override the Reducer.setup() >> function and that would be it, but apparently the readFields function is >> called before the Reducer.setup() function. >> >> My question is what is any (the best) place on the reduce node where I >> can access the job configuration/ context before the readFields function is >> called? >> >> This is the stack trace: >> >> at >> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111) >> at >> org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70) >> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) >> at >> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> >> > --047d7b10d1ff08276104e53522cc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
But how would the comparator have access to the job config= ?


O= n Sat, Aug 31, 2013 at 2:38 AM, Shahab Yunus <shahab.yunus@gmail.com= > wrote:
I think you have to overrid= e/extend the Comparator to achieve that, something like what is done in Sec= ondary Sort?

Regards,
Shahab
<= div class=3D"h5">


On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER <chivas314159@gmail= .com> wrote:
Howdy,

I apologise for th= e lack of code in this message, but the code is fairly convoluted and it wo= uld obscure my problem. That being said, I can put together some sample cod= e if really needed.

I am trying to pass some metadata between the map &= ; reduce steps. This metadata is read and generated in the map step and sto= red in the job config. It also needs to be recreated on the reduce node bef= ore the key/ value fields can be read in the readFields function.

I had assumed that I would be able to override the Reducer.setup() func= tion and that would be it, but apparently the readFields function is called= before the Reducer.setup() function.

My question is what is a= ny (the best) place on the reduce node where I can access the job configura= tion/ context before the readFields function is called?

This is the stack trace:

=A0=A0=A0=A0=A0=A0=A0 at org.apac= he.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
=A0= =A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comp= are(MapTask.java:1111)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.QuickSort.sortInternal(Quic= kSort.java:70)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.QuickSort= .sort(QuickSort.java:59)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapr= ed.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.f= lush(MapTask.java:1298)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapre= d.MapTask$NewOutputCollector.close(MapTask.java:699)
=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:= 370)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.Child$4.run(Child= .java:255)
=A0=A0=A0=A0=A0=A0=A0 at java.security.AccessController.doPri= vileged(Native Method)
=A0=A0=A0=A0=A0=A0=A0 at javax.security.auth.Subj= ect.doAs(Subject.java:415)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.security.UserGroupInformation.do= As(UserGroupInformation.java:1149)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.h= adoop.mapred.Child.main(Child.java:249)



--047d7b10d1ff08276104e53522cc--