hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Buell <jbu...@vmware.com>
Subject RE: Hadoop hardware failure recovery
Date Tue, 14 Aug 2012 00:15:55 GMT
This is never an issue on vSphere.  The ESXi hypervisor does not send the completion interrupt
back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to
disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native

The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed
caches), but this applies equally to native and virtualized OSes.

It is always tempting to implement some kind of write caching in the virtualization layer
to try to improve storage performance, but of course this comes at the cost of safety and


From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, August 13, 2012 8:08 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop hardware failure recovery

On 13 August 2012 07:55, Harsh J <harsh@cloudera.com<mailto:harsh@cloudera.com>>

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).

A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even
when the OS thinks it has flushed a virtual disk -know that you may have set some VM params
to say "when we said "flush to disk" we meant it":

View raw message