Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BFE6B10793 for ; Sat, 31 Aug 2013 01:38:49 +0000 (UTC) Received: (qmail 53858 invoked by uid 500); 31 Aug 2013 01:38:44 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 53684 invoked by uid 500); 31 Aug 2013 01:38:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 53677 invoked by uid 99); 31 Aug 2013 01:38:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Aug 2013 01:38:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shahab.yunus@gmail.com designates 209.85.214.54 as permitted sender) Received: from [209.85.214.54] (HELO mail-bk0-f54.google.com) (209.85.214.54) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Aug 2013 01:38:39 +0000 Received: by mail-bk0-f54.google.com with SMTP id mz12so894947bkb.41 for ; Fri, 30 Aug 2013 18:38:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4+nELhjMj72d79t3qVQm9xJlstWGPefBKxmB7T/1NrE=; b=zrJP/poi0C/QCYqiRBUSro6yj/8MH89SXlqLFUR1NxVVOrhjeVLKr/yzPvR5z7TTGz gd5pgp8y5WrN83bfCGfBFbw1AQfnnI8GFNeMki5gQO86gHvtju44yM5Hz2tp7Zsoq0Fr c5ikuGrTQlLRPM4lEdOVUob+bfA8gthE+yvidUhcHZrG+k/QLyLueTse0AvJJdAyJ5Ae leH24kRfqRT7VBBqkxaB2jUO2EV2ImZgQQzhJ7ltUYIJZMCZ7Q05T89yOckxb3UE/Kxh HwUkL4CRV8yl900MH0jNbyia76Yi18bppDKeDfgL4JZlfFNb4KY01PvHf0o93PyGbEez zn6A== MIME-Version: 1.0 X-Received: by 10.204.123.199 with SMTP id q7mr9247180bkr.10.1377913098421; Fri, 30 Aug 2013 18:38:18 -0700 (PDT) Received: by 10.204.231.76 with HTTP; Fri, 30 Aug 2013 18:38:18 -0700 (PDT) In-Reply-To: References: Date: Fri, 30 Aug 2013 21:38:18 -0400 Message-ID: Subject: Re: Job config before read fields From: Shahab Yunus To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11334a3cc3788304e53464ea X-Virus-Checked: Checked by ClamAV on apache.org --001a11334a3cc3788304e53464ea Content-Type: text/plain; charset=ISO-8859-1 I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER wrote: > Howdy, > > I apologise for the lack of code in this message, but the code is fairly > convoluted and it would obscure my problem. That being said, I can put > together some sample code if really needed. > > I am trying to pass some metadata between the map & reduce steps. This > metadata is read and generated in the map step and stored in the job > config. It also needs to be recreated on the reduce node before the key/ > value fields can be read in the readFields function. > > I had assumed that I would be able to override the Reducer.setup() > function and that would be it, but apparently the readFields function is > called before the Reducer.setup() function. > > My question is what is any (the best) place on the reduce node where I can > access the job configuration/ context before the readFields function is > called? > > This is the stack trace: > > at > org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111) > at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70) > at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > --001a11334a3cc3788304e53464ea Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I think you have to override/extend the Comparator to achi= eve that, something like what is done in Secondary Sort?

Regards,
Shahab


On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER <chivas314159@gmail= .com> wrote:
Howdy,

I apologise for th= e lack of code in this message, but the code is fairly convoluted and it wo= uld obscure my problem. That being said, I can put together some sample cod= e if really needed.

I am trying to pass some metadata between the map &= ; reduce steps. This metadata is read and generated in the map step and sto= red in the job config. It also needs to be recreated on the reduce node bef= ore the key/ value fields can be read in the readFields function.

I had assumed that I would be able to override the Reducer.setup() func= tion and that would be it, but apparently the readFields function is called= before the Reducer.setup() function.

My question is what is a= ny (the best) place on the reduce node where I can access the job configura= tion/ context before the readFields function is called?

This is the stack trace:

=A0=A0=A0=A0=A0=A0=A0 at org.apac= he.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
=A0= =A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comp= are(MapTask.java:1111)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.QuickSort.sortInternal(Quic= kSort.java:70)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.QuickSort= .sort(QuickSort.java:59)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapr= ed.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.f= lush(MapTask.java:1298)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapre= d.MapTask$NewOutputCollector.close(MapTask.java:699)
=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:= 370)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.Child$4.run(Child= .java:255)
=A0=A0=A0=A0=A0=A0=A0 at java.security.AccessController.doPri= vileged(Native Method)
=A0=A0=A0=A0=A0=A0=A0 at javax.security.auth.Subj= ect.doAs(Subject.java:415)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.security.UserGroupInformation.do= As(UserGroupInformation.java:1149)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.h= adoop.mapred.Child.main(Child.java:249)


--001a11334a3cc3788304e53464ea--