[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934030#action_12934030 ] Doug Cutting commented on HADOOP-6685: -------------------------------------- > There is petabytes of data in SequenceFile format in Hadoop clusters everywhere. We cannot drop it, we need to maintain it and keep it up to date. We also need to improve to continue to support existing users. I have never proposed dropping SequenceFile. I have proposed that we not extend it. I have proposed that if we introduce a new concrete binary object data file format (container+serialization) then we should only introduce a single such second-generation format. If we cannot agree on such a format, then we will be stuck adding no new formats to the kernel but rather creating new formats in external projects. > Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map for the serialization specific configuration. Since this data is really internal to the specific serialization, I think we should change it to be an opaque binary blob. This will simplify the interface for defining specific serializations for different contexts (MAPREDUCE-1462). It will also move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.