hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3460) SequenceFileAsBinaryOutputFormat
Date Wed, 04 Jun 2008 20:24:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602446#action_12602446
] 

Chris Douglas commented on HADOOP-3460:
---------------------------------------

This looks great; just a few points:

* Different properties for the output key/value classes aren't necessary; you can use the
existing methods, like JobConf::getOutputKeyClass.
* The generic signature on the RecordWriter can be <BytesWritable,BytesWritable> if
the signature on SeqFileOF were correct:
{noformat}
-public class SequenceFileOutputFormat
-extends FileOutputFormat<WritableComparable, Writable> {
+public class SequenceFileOutputFormat<K extends WritableComparable,
+                                      V extends Writable>
+    extends FileOutputFormat<K,V> {
{noformat}
Permitting SeqFABOF:
{noformat}
public class SequenceFileAsBinaryOutputFormat
    extends SequenceFileOutputFormat<BytesWritable,BytesWritable> {
{noformat}
This generates a warning in MultipleSequenceFileOutputFormat, but it's spurious and can be
suppressed.
* Since record compression is not supported, it might be worthwhile to override OutputFormat::checkOutputSpecs
and throw if it's attempted
* This should be in o.a.h.mapred.lib rather than o.a.h.mapred
* Keeping a WritableValueBytes instance around (and adding a reset method) might be useful,
so a new one isn't created for each write.
* The IllegalArgumentException in WritableValueBytes should probably be an UnsupportedOperationException
* WritableValueBytes should be a _static_ inner class
* The indentation on the anonymous RecordWriter::close should be consistent with the standards

> SequenceFileAsBinaryOutputFormat
> --------------------------------
>
>                 Key: HADOOP-3460
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3460
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Koji Noguchi
>            Priority: Minor
>         Attachments: HADOOP-3460-part1.patch
>
>
> Add an OutputFormat to write raw bytes as keys and values to a SequenceFile.
> In C++-Pipes, we're using SequenceFileAsBinaryInputFormat to read Sequencefiles.
> However, we current don't have a way to *write* a sequencefile efficiently without going
through extra (de)serializations.
> I'd like to store the correct classnames for key/values but use BytesWritable to write
> (in order for the next java or pig code to be able to read this sequencefile).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message