hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Is there a way to keep all intermediate files there after the MapReduce Job run?
Date Fri, 01 Mar 2013 13:49:50 GMT
Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.


2013/3/1 Michael Segel <michael_segel@hotmail.com>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
> You can easily access the job.xml file from the JT webpage.
> On Mar 1, 2013, at 4:14 AM, Ling Kun <lkun.erlv@gmail.com> wrote:
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
> Thanks
> yours,
> Ling Kun
> --
> http://www.lingcc.com

View raw message