hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Bernadsky (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-10669) Avro serialization does not flush buffered serialized values causing data lost
Date Sun, 08 Jun 2014 05:37:01 GMT
Mikhail Bernadsky created HADOOP-10669:
------------------------------------------

             Summary: Avro serialization does not flush buffered serialized values causing
data lost
                 Key: HADOOP-10669
                 URL: https://issues.apache.org/jira/browse/HADOOP-10669
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 2.4.0
            Reporter: Mikhail Bernadsky


Found this debugging Nutch. 

MapTask serializes keys and values to the same stream, in pairs: 

keySerializer.serialize(key); 
..... 
valSerializer.serialize(value);
 ..... 
bb.write(b0, 0, 0); 

AvroSerializer does not flush its buffer after each serialization. So if it is used for valSerializer,
the values are only partially written or not written at all to the output stream before the
record is marked as complete (the last line above).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message