hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Separating mapper intermediate files
Date Tue, 27 Mar 2012 05:15:04 GMT
Hello Aayush,

Three things that'd help clear your confusion:
1. dfs.data.dir controls where HDFS blocks are to be stored. Set this
to a partition1 path.
2. mapred.local.dir controls where intermediate task data go to. Set
this to a partition2 path.

> Furthermore, can someone also tell me how to save intermediate mapper
> files(spill outputs) and where are they saved.

Intermediate outputs are handled by the framework itself (There is no
user/manual work involved), and are saved inside attempt directories
under mapred.local.dir.

On Tue, Mar 27, 2012 at 4:46 AM, aayush <aayushgupta.84@gmail.com> wrote:
> I am a newbie to Hadoop and map reduce. I am running a single node hadoop
> setup. I have created 2 partitions on my HDD. I want the mapper intermediate
> files (i.e. the spill files and the mapper output) to be sent to a file
> system on Partition1 whereas everything else including HDFS should be run on
> partition2. I am struggling to find the appropriate parametes in the conf
> files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
> not sure how to use what. I would really appreciate if someone could tell me
> exactly which parameters to modify to achieve the goal.

Harsh J

View raw message