Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of adi@cloudera.com designates
 209.85.160.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr3G4dVb71rP6KBebsX0sy3hVgBMxodYc5Ym=6mv3ECOGw@mail.gmail.com>
References: <037501cdd794$2ae7fa30$80b7ee90$@yahoo.com>
	<CAOcnVr3G4dVb71rP6KBebsX0sy3hVgBMxodYc5Ym=6mv3ECOGw@mail.gmail.com>
Date: Tue, 11 Dec 2012 12:22:28 -0800
Message-ID: 
 <CAE_FxDjp-sAV0M28GUh_j-vrw1WUBozMtchA7v3L_EZmvyyTnA@mail.gmail.com>
Subject: Re: Can we declare some HDFS nodes "primary"
From: Andy Isaacson <adi@cloudera.com>
To: user@hadoop.apache.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Rack awareness will help, but it's a "best effort" rather than
guaranteed replication. Over time the cluster will converge to having
at least one replica on each rack, but even just normal block churn
can result in significant time periods where rack replication policy
is violated. The issue becomes worse if you lose one of those 10
servers and rereplication happens -- the rereplication can take hours.

Depending on your use case, you could

1. run the 10 servers with dfs.data.dir on one (or several) EBS volume(s).
2. replicate your data to S3. (There's no plumbing in HDFS to do this
automatically, alas.)
3. run as two separate clusters (10 nodes in one, 500 in another) and
distcp between them.

As you can see from those suggestions, HDFS really isn't designed with
this scenario in mind...

-andy

On Tue, Dec 11, 2012 at 5:33 AM, Harsh J <harsh@cloudera.com> wrote:
> Rack awareness with replication factor of 3 on files will help.
>
> You could declare two racks, one carrying these 10 nodes, and default rac=
k
> for the rest of them, and the rack-aware default block placement policy w=
ill
> take care of the rest.
>
> On Dec 11, 2012 5:10 PM, "David Parks" <davidparks21@yahoo.com> wrote:
>>
>> Assume for a moment that you have a large cluster of 500 AWS spot instan=
ce
>> servers running. And you want to keep the bid price low, so at some poin=
t
>> it=92s likely that the whole cluster will get axed until the spot price =
comes
>> down some.
>>
>>
>>
>> In order to maintain HDFS continuity I=92d want say 10 servers running a=
s
>> normal instances, and I=92d want to ensure that HDFS is replicating 100%=
 of
>> data to those 10 that don=92t run the risk of group elimination.
>>
>>
>>
>> Is it possible for HDFS to ensure replication to these =93primary=94 nod=
es?
>>
>>