avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ey-chih chow <eyc...@hotmail.com>
Subject RE: is this a bug?
Date Thu, 03 Mar 2011 00:12:20 GMT

Sorry I found that my previous message in the archive become all in black.  Let me re-explain
the problem.  The following piece of code for AvroReducer causes problem:
           public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord>
collector, Reporter reporter) throws IOException {                         GenericRecord record
= null;                         for (GenericRecord value : values) {                     
           -- code omitted here --                                 record = value;       
                         record.put("rowkey", key);   <=== this statement causes problem
                                collector.collect(record);                         }     
      }
As explained in my previous message, if I remove the statement record.put("rowkey", key),
the code works fine, in that the key values pairs passed to the routine reduce() are correct.
 But if you add this statement, the key values pairs passed to the routine reduce() are out
of order, something like (key1, values1), (key2, values3) rather than (key2, values2).  Some
details are explained in my previous message.  Is  this problem relating to Hadoop binary
iterators or Avro deserialization code?  Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: is this a bug?
Date: Wed, 2 Mar 2011 13:05:55 -0800








Hi,
I am working on an Avro MR job and encountering an issue with AvroReducer<Utf8, GenericRecord,
GenericRecord>. The corresponding reduce() routine is implemented in the following way:
public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord>
collector, Reporter reporter) throws IOException {
                                  .                                  .                   
              .
       GenericRecord record = null;
       for (GenericRecord value : values) {                                   .          
                        .                                   .            record = value; 
          record.put("rowkey", key);                                   .                 
                 .                                   .            collector.collect(record);
        }} 
If I comment out the statement in red in the above code, the reduce function gets called properly
with CORRECT key values pairs passed to reduce().  However, if I add the statement in red
to the routine, the reduce function is called with WRONG key values pairs, in the sense that
key2 paired with values3, instead of values2, when passed to the reduce() routine.  I traced
this problem by including Hadoop source code, such as ReduceTask.java, Task.java, and Avro
source code, such as HadoopReducer.java, HadoopReducerBase.java, and all the serialization
code.  The problem showed up on the second call of the reduce(), but I can not locate the
exact place that cause the problem.  My intuition is that this is incurred in either the hadoop
iterators after merge sort or Avro deserialization.  Is there anybody can help me on this?
 Thanks.
Ey-Chih Chow    		  		 	   		  
Mime
View raw message