hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Tse <kevintse.on...@gmail.com>
Subject Questions about mapred.local.dir
Date Tue, 08 Jun 2010 10:51:20 GMT
Hi, everyone.
I am running a Hadoop-0.19.2 cluster of 4 linux boxes. it is bad that for
the moment we don't have much available disk space on these 4 nodes, 1.5 TB
in total, spreading over 8 disks, 2 for each node. and the available space
in these disks are not equal.

I have the following configuration.
I tried to run a job that would write estimated amount of 1.2
TB intermediate data. I thought that with the following configuration, no
any disk's space would drop under 2 GB, but it was not ture, two of the
disks' space dropped to 0 before the job finished, so I had to kill the job
and free the space for other applications running on those machines.

we are going to buy 3*1TB disks, this may solve the problem, but I still
want to know how to properly set the following 3 properties.


And there's another problem, while my MR job is being executed in the hadoop
cluster, for each tasktacker, there are many INFO log message as this:

2010-06-07 16:51:20,721 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
in any of the configured local directories

I don't know whether this is harmless, but it seems so cause my MR job
completed successfully.

And another question, is it possible to make the reduces start to run before
all the maps complete?

Thank you in advance.
Kevin Tse

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message