hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Bretherton <...@mail.nerc-essc.ac.uk>
Subject Hadoop evaluation: Reliability issues and NFS
Date Thu, 30 Mar 2006 19:13:12 GMT
Dear Hadoop users and developers,

I have been evaluating Hadoop with a view to using it as a distributed 
computing platform for performing fast operations on locally stored data.  I 
have found several problems that are causing Hadoop to be unreliable on our 
network of 13 Sun Sparcs and 9 Linux desktop PCs.  I am running a simple test 
application that searches through an environmental data set containing 1800 
files of 22MB each, looking for files containing temperatures above a certain 
level.  We do not need to use the distributed filesystem in Hadoop because 
the data and home directories are available on every machine via NFS.  I have 
been running the test repeatedly to find out the average run times for 
different numbers of distributed map operations, but I have not managed to 
run my test program to completion enough times get any indication of how much 
faster it runs with multiple maps.  I found that reliability was much 
improved when I scaled down the problem to a search through 250 files instead 
of 1800, but for this relatively small task there was no speed advantage in 
using Hadoop at all.  This means that we may not be investigating Hadoop any 
further for use with our applications, but I thought that it might be helpful 
to post the results of my evaluation on the mailing list.  Here are the 
details of the specific problems that I identified during my tests of the 
nightly build from the 27th March.


The first problem I have is that task tracker processes often die while I am 
running my test program.  Typically, the number of slave nodes that can be 
seen by the job tracker gradually decreases over time, and when I check the 
nodes that have disappeared I find that the task tracker process has stopped 
running on that machine.  The resulting error message in the task tracker log 
file for the disappearing nodes begins with the following lines:

060330 130100 task_m_f8jt6q  SEVERE FSError from child
060330 130100 task_m_f8jt6q org.apache.hadoop.fs.FSError: java.io.IOException: 
Stale NFS file handle
060330 130100 task_m_f8jt6q  at 
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:143)

When I am not running anything the task trackers happily sit there for hours 
without expiring.


The second problem is that a lot of jobs fail due to failed map and reduce 
operations.  Map tasks usually fail with either of the two following errors:

java.io.IOException: Task process exit with nonzero status. at 
org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at ...etc.

... or ...

java.io.FileNotFoundException: /users/dab/Hadoop/input/Occam/.nfs000047ab00000001 
at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:114) 
at ....etc.

.... or occasionally ......

java.lang.ClassFormatError: mapreducetest/Main$MapClass (Truncated class file) 
at java.lang.ClassLoader.defineClass0(Native Method) at ....etc.

Reduce tasks usually fail with this error:

java.lang.NullPointerException at java.net.Socket.(Socket.java:301) at 
java.net.Socket.(Socket.java:153) at org.apache.hadoop.ipc.Client$Connection.
(Client.java:114) at ...etc.

I don't think this is related to the problem of disappearing nodes, because 
tasks often fail during periods when no nodes are being lost.  I have been 
running with one reduce task per job.


The third problem is related to the Hadoop distributed filesystem (DFS).  I 
tried running in DFS mode to see if this improved reliability, though this 
was not a true test of the DFS because of our NFS setup (i.e. all the DFS 
blocks actually end up in my home directory on a single disk).  I should also 
point out that the input data involved in the DFS was just a list of file 
names, not the temperature data itself.  Using the DFS I found that the jobs 
often failed because of problems with missing blocks of data.  Here is a 
typical error message from the job tracker log file.

java.io.IOException: Could not obtain block blk_-3035035931951255964 at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:362) 
at ...etc.

As soon as these errors start to appear it means that the DFS is broken.  
Performing a “get” on the input data directory gives errors like the 
following:

060330 193615 No node available for block blk_-4259713996952710409
060330 193615 Could not obtain block from any node:  java.io.IOException: No 
live nodes contain current block

The only solution I found was to delete the input directory from the DFS and 
then put it back, after which one or two jobs would run to completion before 
the same thing happened again.  It occurred to me that this problem might be 
something to do with the fact that I was still using NFS when running in DFS 
mode, obviously not what the DFS was designed for.   No task trackers died 
while I was running in DFS mode, but that may be just a coincidence.


Another factor that affects reliability is the number of maps.  I never 
managed to complete a job with more than one map per node, but my experience 
with another distributed computing platform that I have been evaluating, the 
Distributed Parallel Processing Environment for Java (DPPEJ) from IBM 
(http://www.alphaworks.ibm.com/tech/dppej), showed that there is no advantage 
to having more than a handful of distributed processes simultaneously trying 
to access data on the same disk.  Therefore, the maximum number of maps I 
used is one for each of our 13 Suns.

I don't think that there is anything particularly unusual or unreliable about 
our computers or our network, although I expect that my test application is 
not what Hadoop was really intended for.  When I scaled down my test problem 
to improve reliability I found that Hadoop did not reduce the average run 
time very much at all, probably because the overheads involved in task 
scheduling and tracking outweighed the benefits of parallelisation.   IBM's 
DPPEJ runs my test application well and reduces average run times by about a 
factor of five.

I would be interested to hear any comments or suggestions relating to any of 
these issues, and I will be happy to supply any further details requested.

Regards,
 
-- 
Dan Bretherton
Environmental Systems Science Centre
Harry Pitt Building
3 Earley Gate
Reading University
Reading, RG6 6AL
UK

Tel. +44 118 378 7722

Mime
View raw message