hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From psdc1978 <psdc1...@gmail.com>
Subject HDFS and MapReduce and /tmp directory
Date Mon, 05 Apr 2010 10:54:39 GMT

When I run an MapReduce example, I've noticed that some temporary
directories are buit in /tmp directory.

In my case, in the /tmp/hadoop directory it was created the following file
directory during the execution of wordcount example:


|-- attempt_201004041803_0002_m_000000_0_0_m_0

|   |-- job.xml

|   |-- output

|   |   |-- file.out

|   |   `-- file.out.index

|   |-- pid

|   `-- split.dta

1 - In the map attempt task it exists a file.out and split.dta file.The
split.dta is the map output produced by the map and that will be fetched by
the reducer?

2 - What's the file.out and file.out.index?

3 - Is this data were written by MR anything related to HDFS?

4 - I'm a bit confused to differentiate between the files that are written
in /tmp directory during the execution of my example, and the place where
the files are written with the command
"bin/hadoop dfs -copyFromLocal".

a) When I execute the "bin/hadoop dfs -copyFromLocal <from> <to>" command,
where's the destination folder?

b) Is it in memory or is physically in my HD?

c) If the files are written in the HD, in wich directory are they?

d) What is the difference between the data written win the command
-copyFromLocal and the data written in the /tmp directory?

5 - The output of a reducer example comes in the form part_0000 that is
written in gutenberg-output. Where is this file? Is it in my HD?

Thank you,


View raw message