On Thu, Jun 30, 2011 at 5:24 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
> I'd advise you to look at "stock hadoop" again. This used to be true, but
> was fixed a long while back by HDFS-457 and several followup JIRAs.
>
> If MapR does something fancier, I'm sure we'd be interested to hear about
> it
> so we can compare the approaches.
>
> -Todd
>
>
MapR tracks disk responsiveness. In other words, a moving histogram of
IO-completion times is maintained internally, and if a disk starts getting
really slow, it is pre-emptively taken offline so it does not create long
tails for running jobs (and the data on the disk is re-replicated using
whatever re-replication policy is in place). One of the benefits of
managing the disks directly instead of through ext3 / xfs / or other ...
All these stats can be fed into Ganglia (or pushed out centrally via a text
file that can be pulled out using NFS) if historical info about disk
behavior (and failures) needs to be preserved.
- Srivas.
|