hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MultipleOutputs is not working properly when dfs.block.size is changed
Date Thu, 18 Aug 2011 10:09:21 GMT

Need some more information:
- Version of Hadoop?
- Do you have a runnable sample test case to reproduce this? Or can
you describe roughly the steps you are performing to create an output?

FWIW, I ran the trunk's MO tests and those seem to pass for both APIs,
but they do not change dfs.block.size, although I fail to see the
relation between these.

On Thu, Aug 18, 2011 at 2:00 PM, Dino Kečo <dino.keco@gmail.com> wrote:
> Hi all,
> I have been working on hadoop jobs which are writing output into multiple
> files. In Hadoop API I have found class MultipleOutputs which implement this
> functionality.
> My use case is to change hdfs block size in one job to increase parallelism
> and I am doing that using dfs.block.size configuration property. Part of
> output file is missing when I change this property (couple of last lines in
> some cases half of line is missing).
> I was doing debugging and everything looks fine before calling outputs.write
> ("sucessfull", KEY, VALUE);
> For output format I am using TextOutputFormat.
> When I remove MultipleOutputs from my code everything is working ok.
> Is there something i am doing wrong or there is issue with multiple outputs
> ?
> regards,
> dino

Harsh J

View raw message