hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: How to append the contents to a output file
Date Thu, 02 Apr 2015 11:37:49 GMT
I hope I understood your requirement correctly.


If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?

Reducer-output=
folder1/file1
folder2/file2
....

This can be done with standard MultipleOutputFormat and the framework will
write data into each folder and make sure it is appended in that file. You
don't need to write your own.
Have you seen this?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If the issue is that in each folder you want ONE file for all the reducers,
then that you have to do yourself but post-job merge. One option is to use
the FileUtil.copyMerge (
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html)
method to achieve this once hte job is finished.

Regards,
Shahab

On Wed, Apr 1, 2015 at 11:59 PM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Dear Team,
>
> I am trying to append the contents to a reducer output file using multiple
> output.
>
> My requirement is to write the reducer output to mutiple folders and the
> data must be appended to the existing content.
>
> Now I have used the custom output format by extending the Text output
> format class and able to write the data into multiple folders but the issue
> I am facing is, it is overwriting the data in the files but I would rather
> want it to append the data to the output files.
>
> Please let me know how to handle this situiation.
>
> Thanks and regards,
>
> Raghav Chandra
>

Mime
View raw message