hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: Can we declare some HDFS nodes "primary"
Date Tue, 11 Dec 2012 23:53:05 GMT
It seems like those 10 nodes out of 500 would be a hot spot for
writes, if there was a hard requirement that they accept all writes.
This might be acceptable on a very read-focused workload, but are you
sure that's what you've got?

Another consideration is that if Amazon goes down, my understanding is
that all instance storage will be toast.  I think they guarantee that
things stored on EBS, S3, or glacier are durable; instance storage,
not so much.  Power outages do happen after all!  On the other hand, I
am not an expert about Amazon's offerings-- maybe someone else can
clarify the exact guarantees they provide.

You could consider bumping up the replication factor to something
above 3, and relying on the improbability of a 5x (or whatever)
instance failure.  You might also consider periodically rsyncing the
data to s3.


On Tue, Dec 11, 2012 at 3:39 AM, David Parks <davidparks21@yahoo.com> wrote:
> Assume for a moment that you have a large cluster of 500 AWS spot instance
> servers running. And you want to keep the bid price low, so at some point
> it’s likely that the whole cluster will get axed until the spot price comes
> down some.
> In order to maintain HDFS continuity I’d want say 10 servers running as
> normal instances, and I’d want to ensure that HDFS is replicating 100% of
> data to those 10 that don’t run the risk of group elimination.
> Is it possible for HDFS to ensure replication to these “primary” nodes?

View raw message