hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman" <jeast...@collab.net>
Subject RE: Platform reliability with Hadoop
Date Wed, 16 Jan 2008 18:08:53 GMT
Thanks, I will try a safer place for the DFS.

-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Wednesday, January 16, 2008 10:04 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Platform reliability with Hadoop

The /tmp default has caught us once or twice too. Now we put the files 

lohit.vijayarenu@yahoo.com wrote:
>> The DFS is stored in /tmp on each box. 
>> The developers who own the machines occasionally reboot and reprofile
> Wont you lose your blocks after reboot since /tmp gets cleaned up?
Could this be the reason you see data corruption?
> Good idea is to configure DFS to be any place other than /tmp 
> Thanks,
> Lohit
> ----- Original Message ----
> From: Jeff Eastman <jeastman@collab.net>
> To: hadoop-user@lucene.apache.org
> Sent: Wednesday, January 16, 2008 9:32:41 AM
> Subject: Platform reliability with Hadoop
> I've been running Hadoop 0.14.4 and, more recently, 0.15.2 on a dozen
> machines in our CUBiT array for the last month. During this time I
> experienced two major data corruption losses on relatively small
>  amounts
> of data (<50gb) that make me wonder about the suitability of this
> platform for hosting Hadoop. CUBiT is one of our products for managing
>  a
> pool of development servers, allowing developers to check out
> install various OS profiles on them and monitor their utilization via
> the web. With most machines reporting very low utilization it seemed a
> natural place to run Hadoop in the background. I have an NFS-mounted
> account on all of the machines and have installed Hadoop there. The
> is stored in /tmp on each box. The developers who own the machines
> occasionally reboot and reprofile them, but this occurs infrequently
>  and
> does not clobber /tmp. Hadoop is designed to deal with slave failures
>  of
> this nature, though this platform may well be an acid test.
> My initial cloud was configured for replication factor of 3 and I have
> increased that now to 4 in hopes of improving data reliability in the
> face of these more-prevalent slave outages. Ted Dunning has suggested
> aggressive rebalancing in his recent posts and I have done this by
> increasing replication to 5 (from 3) and then dropping it to 4. Are
> there other rebalancing or configuration techniques that might improve
> my data reliability? Or, is this platform just too unstable to be a
>  good
> fit for Hadoop?
> Jeff

View raw message