hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3460) SequenceFileAsBinaryOutputFormat
Date Fri, 06 Jun 2008 17:21:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Koji Noguchi updated HADOOP-3460:
---------------------------------

    Attachment: HADOOP-3460-part3.patch

 bq.  1.  The testcase: doesn't need a main method, you might want to break up the check for
forbidding record compression into a separate test, 

Separted the test into three. testbinary, testSequenceOutputClassDefaultsToMapRedOutputClass,
and testcheckOutputSpecsForbidRecordCompression.

Also, I had a bug in the testing such that  checkOutputSpecs was throwing an exception because
output path was not set and not because RECORD compression was being set.
Fixed it.

bq. and the call to JobConf::setInputPath is generating a warning (replace with FileInputFormat::addInputPath)

Ah. I should have compiled with "-Djavac.args="-Xlint -Xmaxwarns 1000".  
Done.


bq.   2. WritableValueBytes::writeCompressedBytes no longer throws IllegalArgumentException,
so that can be removed from its signature

I left it in since the original SequenceFile.ValueBytes has a signature 
{noformat} 
    public void writeCompressedBytes(DataOutputStream outStream) 
      throws IllegalArgumentException, IOException;
{noformat} 
Should I still take it out?

bq.   3. SeqFABOF::checkOutputSpecs doesn't need to list InvalidJobConfException

Done.

> SequenceFileAsBinaryOutputFormat
> --------------------------------
>
>                 Key: HADOOP-3460
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3460
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: HADOOP-3460-part1.patch, HADOOP-3460-part2.patch, HADOOP-3460-part3.patch
>
>
> Add an OutputFormat to write raw bytes as keys and values to a SequenceFile.
> In C++-Pipes, we're using SequenceFileAsBinaryInputFormat to read Sequencefiles.
> However, we current don't have a way to *write* a sequencefile efficiently without going
through extra (de)serializations.
> I'd like to store the correct classnames for key/values but use BytesWritable to write
> (in order for the next java or pig code to be able to read this sequencefile).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message