hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs
Date Thu, 21 Oct 2010 22:31:40 GMT
Hi Ed,

Sounds like this might be a bug, either in MultipleOutputs or in LZO.

Does it work properly with gzip compression? Which LZO implementation
are you using? The one from google code or the more up to date one
from github (either kevinweil's or mine)?

Any chance you could write a unit test that shows the issue?


On Thu, Oct 21, 2010 at 2:52 PM, ed <hadoopnode@gmail.com> wrote:
> Hello everyone,
> I am having problems using MultipleOutputs with LZO compression (could be a
> bug or something wrong in my own code).
> In my driver I set
>     MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class,
> NullWritable.class, Text.class);
> In my reducer I have:
>     MultipleOutputs<NullWritable, Text> mOutput = new
> MultipleOutputs<NullWritable, Text>(context);
>     public String generateFileName(Key key){
>        return "custom_file_name";
>     }
> Then in the reduce() method I have:
>     mOutput.write(mNullWritable, mValue, generateFileName(key));
> This results in creating LZO files that do not decompress properly (lzop -d
> throws the error "lzop: unexpected end of file: outputFile.lzo")
> If I switch back to the regular context.write(mNullWritable, mValue);
> everything works fine.
> Am I forgetting a step needed when using MultipleOutputs or is this a
> bug/non-feature of using LZO compression in Hadoop.
> Thank you!
> ~Ed

Todd Lipcon
Software Engineer, Cloudera

View raw message