hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Mappers getting killed
Date Thu, 27 Oct 2011 08:22:26 GMT
Hi,

I have a situation where I have to read a large file into every mapper.

Since its a large HDFS file that is needed to work on each input to the
mapper, it is taking a lot of time to read the data into the memory from
HDFS.

Thus the system is killing all my Mappers with the following message:

11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
attempt_201106271322_12504_m_000000_0, Status : FAILED
Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
seconds. Killing!

The cluster is not entirely owned by me and hence I cannot change the *
mapred.task.timeout* so as to be able to read the entire file.

Any suggestions?

Also, is there a way such that a Mapper instance reads the file once for all
the inputs that it receives.
Currently, since the file reading code is in the map method, I guess its
reading the entire file for each and every input leading to a lot of
overhead.

Please help!

Many thanks in advance!!

Warm regards
Arko

Mime
View raw message