hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject HDFS vs Giant Direct Attached Arrays
Date Fri, 19 Feb 2010 19:08:01 GMT
Hadoop is great. Almost every day I live gives me more reasons to like it.

My story for today:
We have a system running a file system with a 48 TB Disk array on 4
shelves. Today I got this information about firmware updates. (don't
you love firmware updates?)

---------------
Any XXXX Controllers configured with any of the XXXX SATA hard drives
listed in the Scope section might exhibit slow virtual disk
initialization/expansion, rare drive stalls and timeouts, scrubbing
errors, and reduced performance.

Data might be at risk if multiple drive failures occur. Proactive hard
drive replacement is neither necessary, nor authorized.

"Updating the firmware of disk drives in a virtual disk risks the loss
of data and causes the drives to be temporarily inaccessible.”
---------------
In a nutshell, the safest way is to offline the system and update
disks one at a time (we don't know how long updating one disk takes).
Or we have to smart fail disks and move them out of this array into
another array, (lucky we have another one) apply the firmware, put the
disk back in, wait for re-stripe! Repeat 47 times!

So the options are:
1) Risky -- do the update online hope we do not corrupt the thing
2) slow -- offline the system update 1 disk at a time as suggested

No option has 0 downtime. Also note that since this updates fixes "
reduced performance." Thus this chassis was never operating at max
efficiency, due to whatever reason, RAID card complexity, firmware,
back plane, whatever.

Now, imagine if this was a 6 node hadoop systems with 8 disks a node,
and we had to do a firmware updates. Wow! this would be easy. We could
accomplish this with no system-wide outage, at our leisure. With a
file replication factor of 3 we could hot swap disks, or even safely
fail an entire node with no outage.

We would not need spare hardware, need to inform people of an outage,
or disable alerts. Hadoop would not care if the firmware on all the
disks did not match. Hadoop did not have some complicated RAID that
was running at " reduced performance." all this time. Hadoop just uses
independent disks, much less complexity.

HDFS ForTheWin!

Mime
View raw message