hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit.vijayar...@yahoo.com
Subject Re: Platform reliability with Hadoop
Date Wed, 16 Jan 2008 17:54:30 GMT
>The DFS is stored in /tmp on each box. 
> The developers who own the machines occasionally reboot and reprofile them

Wont you lose your blocks after reboot since /tmp gets cleaned up? Could this be the reason
you see data corruption?
Good idea is to configure DFS to be any place other than /tmp 

----- Original Message ----
From: Jeff Eastman <jeastman@collab.net>
To: hadoop-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 9:32:41 AM
Subject: Platform reliability with Hadoop

I've been running Hadoop 0.14.4 and, more recently, 0.15.2 on a dozen
machines in our CUBiT array for the last month. During this time I have
experienced two major data corruption losses on relatively small
of data (<50gb) that make me wonder about the suitability of this
platform for hosting Hadoop. CUBiT is one of our products for managing
pool of development servers, allowing developers to check out machines,
install various OS profiles on them and monitor their utilization via
the web. With most machines reporting very low utilization it seemed a
natural place to run Hadoop in the background. I have an NFS-mounted
account on all of the machines and have installed Hadoop there. The DFS
is stored in /tmp on each box. The developers who own the machines
occasionally reboot and reprofile them, but this occurs infrequently
does not clobber /tmp. Hadoop is designed to deal with slave failures
this nature, though this platform may well be an acid test.


My initial cloud was configured for replication factor of 3 and I have
increased that now to 4 in hopes of improving data reliability in the
face of these more-prevalent slave outages. Ted Dunning has suggested
aggressive rebalancing in his recent posts and I have done this by
increasing replication to 5 (from 3) and then dropping it to 4. Are
there other rebalancing or configuration techniques that might improve
my data reliability? Or, is this platform just too unstable to be a
fit for Hadoop?



View raw message