Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 22226DCD5 for ; Tue, 11 Dec 2012 20:22:59 +0000 (UTC) Received: (qmail 16354 invoked by uid 500); 11 Dec 2012 20:22:54 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 16268 invoked by uid 500); 11 Dec 2012 20:22:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 16261 invoked by uid 99); 11 Dec 2012 20:22:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 20:22:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adi@cloudera.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 20:22:49 +0000 Received: by mail-pb0-f48.google.com with SMTP id rq13so2928743pbb.35 for ; Tue, 11 Dec 2012 12:22:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-gm-message-state; bh=OnOVqj0zlAGOqdVILgeeq+aPhkXiVoMInc07tL3bpss=; b=XVvyWC997BdypWYTXrI+7wI1VHa/HZ4cEDOgI9k3AHqCrTSZ53pGaqy/6aQ3apOmW5 TVcy6d1r68119F3bB7GQWxT7IETy4lzFRsNSX6pX8GhN2BJ4x02Qw6d3BHyULAU+27sA lbaAYZOYKhjVhCoWtDwB5MMb+cL0OoTncGR4gV1AQGvOvjuJUlpxXEXpkWDAm4l7yldx ded+p8JDwVp0oYvwYNiCG9prF6CN8Q6crcYWQreMnKXGnBJ7vOWs3WUOiKAZV37m3IfF Y8/EwdFvA1Rtl6e0Qjrg+A4E/Ox+XwHnywBGWKYsQGNJ7wnBAHOLh6oKgkGTBZWPd3EJ sLUQ== MIME-Version: 1.0 Received: by 10.66.77.196 with SMTP id u4mr47818923paw.84.1355257348763; Tue, 11 Dec 2012 12:22:28 -0800 (PST) Received: by 10.68.16.135 with HTTP; Tue, 11 Dec 2012 12:22:28 -0800 (PST) In-Reply-To: References: <037501cdd794$2ae7fa30$80b7ee90$@yahoo.com> Date: Tue, 11 Dec 2012 12:22:28 -0800 Message-ID: Subject: Re: Can we declare some HDFS nodes "primary" From: Andy Isaacson To: user@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlu1xtCLOeNrPAv/WvFoCdwWKIDITagVttuiDoh+74r+MmgFkRDT2lSgJZ+sx8ctdX/uvCH X-Virus-Checked: Checked by ClamAV on apache.org Rack awareness will help, but it's a "best effort" rather than guaranteed replication. Over time the cluster will converge to having at least one replica on each rack, but even just normal block churn can result in significant time periods where rack replication policy is violated. The issue becomes worse if you lose one of those 10 servers and rereplication happens -- the rereplication can take hours. Depending on your use case, you could 1. run the 10 servers with dfs.data.dir on one (or several) EBS volume(s). 2. replicate your data to S3. (There's no plumbing in HDFS to do this automatically, alas.) 3. run as two separate clusters (10 nodes in one, 500 in another) and distcp between them. As you can see from those suggestions, HDFS really isn't designed with this scenario in mind... -andy On Tue, Dec 11, 2012 at 5:33 AM, Harsh J wrote: > Rack awareness with replication factor of 3 on files will help. > > You could declare two racks, one carrying these 10 nodes, and default rac= k > for the rest of them, and the rack-aware default block placement policy w= ill > take care of the rest. > > On Dec 11, 2012 5:10 PM, "David Parks" wrote: >> >> Assume for a moment that you have a large cluster of 500 AWS spot instan= ce >> servers running. And you want to keep the bid price low, so at some poin= t >> it=92s likely that the whole cluster will get axed until the spot price = comes >> down some. >> >> >> >> In order to maintain HDFS continuity I=92d want say 10 servers running a= s >> normal instances, and I=92d want to ensure that HDFS is replicating 100%= of >> data to those 10 that don=92t run the risk of group elimination. >> >> >> >> Is it possible for HDFS to ensure replication to these =93primary=94 nod= es? >> >>