Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 20613 invoked from network); 3 Mar 2011 00:12:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Mar 2011 00:12:50 -0000 Received: (qmail 93964 invoked by uid 500); 3 Mar 2011 00:12:49 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 93915 invoked by uid 500); 3 Mar 2011 00:12:49 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 93907 invoked by uid 99); 3 Mar 2011 00:12:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Mar 2011 00:12:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eychih@hotmail.com designates 65.55.90.148 as permitted sender) Received: from [65.55.90.148] (HELO snt0-omc3-s9.snt0.hotmail.com) (65.55.90.148) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Mar 2011 00:12:41 +0000 Received: from SNT113-W49 ([65.55.90.137]) by snt0-omc3-s9.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 2 Mar 2011 16:12:21 -0800 Message-ID: Content-Type: multipart/alternative; boundary="_1047860a-22eb-43a6-9c3d-436bcd2c8380_" X-Originating-IP: [64.164.138.146] From: ey-chih chow To: Subject: RE: is this a bug? Date: Wed, 2 Mar 2011 16:12:20 -0800 Importance: Normal In-Reply-To: References: MIME-Version: 1.0 X-OriginalArrivalTime: 03 Mar 2011 00:12:21.0253 (UTC) FILETIME=[AA39BB50:01CBD937] --_1047860a-22eb-43a6-9c3d-436bcd2c8380_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sorry I found that my previous message in the archive become all in black. = Let me re-explain the problem. The following piece of code for AvroReduce= r causes problem: public void reduce(Utf8 key=2C Iterable values=2C= AvroCollector collector=2C Reporter reporter) throws IOExce= ption { GenericRecord record =3D null=3B = for (GenericRecord value : values) { = -- code omitted here -- record =3D = value=3B record.put("rowkey"=2C key)=3B <= =3D=3D=3D this statement causes problem col= lector.collect(record)=3B } } As explained in my previous message=2C if I remove the statement record.put= ("rowkey"=2C key)=2C the code works fine=2C in that the key values pairs pa= ssed to the routine reduce() are correct. But if you add this statement=2C= the key values pairs passed to the routine reduce() are out of order=2C so= mething like (key1=2C values1)=2C (key2=2C values3) rather than (key2=2C va= lues2). Some details are explained in my previous message. Is this probl= em relating to Hadoop binary iterators or Avro deserialization code? Thank= s. Ey-Chih Chow From: eychih@hotmail.com To: user@avro.apache.org Subject: is this a bug? Date: Wed=2C 2 Mar 2011 13:05:55 -0800 Hi=2C I am working on an Avro MR job and encountering an issue with AvroReducer. The corresponding reduce() routine = is implemented in the following way: public void reduce(Utf8 key=2C Iterable values=2C AvroCollec= tor collector=2C Reporter reporter) throws IOException { . . = . GenericRecord record =3D null=3B for (GenericRecord value : values) { = . . = . record =3D value=3B record.put("rowkey"=2C key)=3B = . . = . collector.collect(record)=3B = }}=20 If I comment out the statement in red in the above code=2C the reduce funct= ion gets called properly with CORRECT key values pairs passed to reduce(). = However=2C if I add the statement in red to the routine=2C the reduce func= tion is called with WRONG key values pairs=2C in the sense that key2 paired= with values3=2C instead of values2=2C when passed to the reduce() routine.= I traced this problem by including Hadoop source code=2C such as ReduceTa= sk.java=2C Task.java=2C and Avro source code=2C such as HadoopReducer.java= =2C HadoopReducerBase.java=2C and all the serialization code. The problem = showed up on the second call of the reduce()=2C but I can not locate the ex= act place that cause the problem. My intuition is that this is incurred in= either the hadoop iterators after merge sort or Avro deserialization. Is = there anybody can help me on this? Thanks. Ey-Chih Chow = --_1047860a-22eb-43a6-9c3d-436bcd2c8380_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sorry I found that my previous message in the archive become all in black. =  =3BLet me re-explain the problem.  =3BThe following piece of code = for AvroReducer causes problem:

 =3B =3B  = =3B  =3B  =3B  =3B public void reduce(Utf8 key=2C Iterable<= =3BGenericRecord>=3B values=2C AvroCollector<=3BGenericRecord>=3B col= lector=2C Reporter reporter) throws IOException {
 =3B = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B GenericRecord record =3D null=3B
&nbs= p=3B =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B for (GenericRecord value : values) {
 =3B =3B  =3B  =3B  =3B  =3B  =3B &nbs= p=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B -- code omitted here --
 =3B =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B record =3D value=3B
<= div> =3B =3B  =3B  =3B  =3B  =3B  =3B  =3B =  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nb= sp=3B record.put("rowkey"=2C key)=3B  =3B <=3B=3D=3D=3D this statemen= t causes problem
 =3B =3B  =3B  =3B  =3B &nbs= p=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B collector.collect(record)=3B
 = =3B =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B =  =3B  =3B  =3B  =3B }
 =3B =3B  =3B &= nbsp=3B  =3B  =3B  =3B}

As explained i= n my previous message=2C if I remove the statement record.put("rowkey"=2C k= ey)=2C the code works fine=2C in that the key values pairs passed to the ro= utine reduce() are correct.  =3BBut if you add this statement=2C the ke= y values pairs passed to the routine reduce() are out of order=2C something= like (key1=2C values1)=2C (key2=2C values3) rather than (key2=2C values2).=  =3BSome details are explained in my previous message.  =3BIs &nbs= p=3Bthis problem relating to Hadoop binary iterators or Avro deserializatio= n code?  =3BThanks.

Ey-Chih Chow

From: eychih@hotmail.com
To: user@avro.apache.o= rg
Subject: is this a bug?
Date: Wed=2C 2 Mar 2011 13:05:55 -0800
=
Hi=2C

I am working on an Avro MR job and encountering an= issue with AvroReducer<=3BUtf8=2C GenericRecord=2C GenericRecord>=3B. = The corresponding reduce() routine is implemented in the following way:

public void reduce(Utf8 key=2C Iterable<=3BGenericR= ecord>=3B values=2C AvroCollector<=3BGenericRecord>=3B collector=2C R= eporter reporter) throws IOException {

 =3B =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B.
 =3B =3B  =3B  =3B  =3B  =3B  =3B  =3B &= nbsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nbs= p=3B  =3B.
 =3B =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B.

 =3B =3B &nb= sp=3B  =3B GenericRecord record =3D null=3B

 =3B&n= bsp=3B  =3B  =3B for (GenericRecord value : values) {
&nb= sp=3B =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B .
 =3B =3B  =3B  =3B  =3B  =3B =  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nb= sp=3B  =3B  =3B  =3B .
 =3B =3B  =3B &nbs= p=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B .
 = =3B =3B  =3B  =3B  =3B  =3B  =3Brecord =3D value=3B=
 =3B =3B  =3B  =3B  =3B  =3B  =3Brecord.put("rowkey"=2C k= ey)=3B
 =3B =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B .
 =3B =3B  =3B =  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nb= sp=3B  =3B  =3B  =3B  =3B  =3B  =3B .
&nb= sp=3B =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B .
 =3B =3B  =3B  =3B  =3B  =3B =  =3Bcollector.collect(record)=3B
 =3B =3B  =3B &n= bsp=3B  =3B }
} =3B

If I comment out the= statement in red in the above code=2C the reduce function gets called prop= erly with CORRECT key values pairs passed to reduce().  =3BHowever=2C i= f I add the statement in red to the routine=2C the reduce function is calle= d with WRONG key values pairs=2C in the sense that key2 paired with values3= =2C instead of values2=2C when passed to the reduce() routine.  =3BI tr= aced this problem by including Hadoop source code=2C such as ReduceTask.jav= a=2C Task.java=2C and Avro source code=2C such as HadoopReducer.java=2C Had= oopReducerBase.java=2C and all the serialization code.  =3BThe problem = showed up on the second call of the reduce()=2C but I can not locate the ex= act place that cause the problem.  =3BMy intuition is that this is incu= rred in either the hadoop iterators after merge sort or Avro deserializatio= n.  =3BIs there anybody can help me on this?  =3BThanks.
=
Ey-Chih Chow  =3B  =3B  =3B
= --_1047860a-22eb-43a6-9c3d-436bcd2c8380_--