From user-return-120-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Wed Aug 8 17:25:27 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C3609614 for ; Wed, 8 Aug 2012 17:25:27 +0000 (UTC) Received: (qmail 53953 invoked by uid 500); 8 Aug 2012 17:25:22 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 53876 invoked by uid 500); 8 Aug 2012 17:25:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 53869 invoked by uid 99); 8 Aug 2012 17:25:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 17:25:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gaurav.gs.sharma@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 17:25:17 +0000 Received: by pbbrq8 with SMTP id rq8so2089260pbb.35 for ; Wed, 08 Aug 2012 10:24:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=j9CtHWhGp+n0hmewK3p3hD7N3Jr8n17D/lbsM2xtrok=; b=lsALhwlfL2dtmHjswAYNut+kDKR0/2FtSXr1SOv4suBrtX8l3EEgvVfF3hJ1uNmBoB ciBqb/OD8CXc4KzDMUu2JJ646wxpr0G0TionIc1ai9+JTAa++6/kIeatGXaNkEiDG1vc yGauUzEZsjtapx7aYx/gYJwoDwymUfRyEsIkcx6nsSI52t7cl5dVa2gt/UH4crSc4Zbp eQyFZyJ/c4TF9XsIQJ/bGXj6ChSoH1JOvx+T01IHJPPt7xrpKTbv1XdJdk1mAfu4MYkU 43FO1nT/CD6SpPHK6y2t6kbXUmPhzUlQg/BTdXc/PRulNAVTLMDjCl813thAeyHzqwde IFEg== Received: by 10.68.227.163 with SMTP id sb3mr38656264pbc.74.1344446696464; Wed, 08 Aug 2012 10:24:56 -0700 (PDT) Received: from [192.168.2.3] (c-50-131-91-5.hsd1.ca.comcast.net. [50.131.91.5]) by mx.google.com with ESMTPS id sk5sm13763055pbc.7.2012.08.08.10.24.54 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 08 Aug 2012 10:24:55 -0700 (PDT) Subject: Re: is HDFS RAID "data locality" efficient? References: <00bc01cd7585$4c585550$e508fff0$@com> From: Gaurav Sharma Content-Type: multipart/alternative; boundary=Apple-Mail-86D17626-C9B9-428F-8603-C380D1CBDA10 X-Mailer: iPhone Mail (9B206) In-Reply-To: Message-Id: <765BDCBF-2951-4966-A53A-11A0ADC32576@gmail.com> Date: Wed, 8 Aug 2012 10:24:52 -0700 To: "user@hadoop.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-86D17626-C9B9-428F-8603-C380D1CBDA10 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Indeed, erasure encoding is a component of a good storage solution esp. for h= olding on to PB scale datasets but there's an associated cost in terms of la= tency for real time serving. Depending on the domain (eg. where temporal loc= ality is observed in access patterns), it works well if the hot dataset is s= mall and can be served efficiently from elsewhere. It is a great fit for DW t= ype workloads. Fb had a good presentation sometime back where they discussed= a typical impl with Reed Solomon codes et al. On Aug 8, 2012, at 10:06, Michael Segel wrote: > Just something to think about...=20 >=20 > There's a company here in Chicago called Cleversafe. I believe they recent= ly made an announcement concerning Hadoop?=20 >=20 > The interesting thing about RAID is that you're adding to the disk latency= and depending on which raid you use you could kill performance on a rebuild= of a disk.=20 >=20 > In terms of uptime of Apache based Hadoop, RAID allows you to actually hot= swap the disks and unless you lose both drives (assuming Raid 1, mirroring)= , your DN doesn't know and doesn't have to go down.=20 > So there is some value there, however at the expense of storage and storag= e costs.=20 >=20 > You can reduce the replication factor to 2. I don't know that I would go t= o anything lower because you still can lose the server...=20 >=20 > In terms of data locality... maybe you lose a bit, however... because you'= re raiding your storage, you now have less data per node. So you end up with= more nodes, right?=20 >=20 > Just some food for thought.=20 >=20 > On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay wro= te: >=20 >> Hi folks! >> =20 >> I have just read about the HDFS RAID feature that was added to Hadoop 0.2= 1 or 0.22. and I am quite curious to know if people use it, what kind of use= >> they have and what they think about Map/Reduce data locality. >> =20 >> First big actor of this technology is Facebook, that claims to save many P= B with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and= 5). >> =20 >> I understand the following advantages with HDFS RAID: >> - You can save space >> - System tolerates more missing blocks >> =20 >> Nonetheless, one of the drawback I see is M/R data locality. >> As far as I understand, the advantage of having 3 replicas of each blocks= is not only security if one server fails or a block is corrupted, >> but also the possibility to have as far as 3 tasktrackers executing the m= ap task with =E2=80=9Clocal data=E2=80=9D. >> If you consider the 4th slide of the Facebook presentation, such infrastr= ucture decreases this possibility to only 1 tasktracker. >> That means that if this tasktracker is very busy executing other tasks, y= ou have the following choice: >> - Waiting this tasktracker to finish executing (part of) the cur= rent tasks (freeing map slots for instance) >> - Executing the map task for this block in another tasktracker, t= ransferring the information of the block through the network >> In both cases, you=C2=B4ll get a M/R penalty (please, tell me if I am wro= ng). >> =20 >> Has somebody considered such penalty or has some benchmarks to share with= us? >> =20 >> One of the scenario I can think in order to take advantage of HDFS RAID w= ithout suffering this penalty is: >> - Using normal HDFS with default replication=3D3 for my =E2=80=9C= fresh data=E2=80=9D >> - Using HDFS RAID for my historical data (that is barely used by= M/R) >> =20 >> And you, what are you using HDFS RAID for? >> =20 >> Regards, >> =20 >> Sourygna Luangsay >=20 --Apple-Mail-86D17626-C9B9-428F-8603-C380D1CBDA10 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Indeed, erasure encoding i= s a component of a good storage solution esp. for holding on to PB scale dat= asets but there's an associated cost in terms of latency for real time servi= ng. Depending on the domain (eg. where temporal locality is observed in acce= ss patterns), it works well if the hot dataset is small and can be served ef= ficiently from elsewhere. It is a great fit for DW type workloads. Fb had a g= ood presentation sometime back where they discussed a typical impl with Reed= Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Se= gel <michael_segel@hotmail.c= om> wrote:

Just something to think about...&n= bsp;

There's a company here in Chicago called Cleversafe.= I believe they recently made an announcement concerning Hadoop? 
=

The interesting thing about RAID is that you're adding t= o the disk latency and depending on which raid you use you could kill perfor= mance on a rebuild of a disk. 

In terms of upt= ime of Apache based Hadoop, RAID allows you to actually hot swap the disks a= nd unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't= know and doesn't have to go down. 
So there is some value th= ere, however at the expense of storage and storage costs. 
You can reduce the replication factor to 2. I don't know that I= would go to anything lower because you still can lose the server... 

In terms of data locality... maybe you lose a bit, h= owever... because you're raiding your storage, you now have less data per no= de. So you end up with more nodes, right? 

Jus= t some food for thought. 

On Aug 8, 2012, at 11= :46 AM, Sourygna Luangsay <sluan= gsay@pragsis.com> wrote:

Hi folks!
 
I have j= ust read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. a= nd I am quite curious to know if people use it, what kind of use
they hav= e and what they think about Map/Reduce data locality.
 
<= span lang=3D"EN-US">First big actor of this technology is Facebook, that cla= ims to save many PB with it (see = http://= www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
 
I understand the following advantages with HDFS= RAID:
-          You can save space
-     &= nbsp;    System tolerates more missing blocks=
 <= /div>
Nonetheless, one of the drawback I= see is M/R data locality.
As far as I understand, the advantage of having 3 replicas of eac= h blocks is not only security if one server fails or a block is corrupted,but also the possibility to have as far as 3 tasktrackers executing the ma= p task with =E2=80=9Clocal data=E2=80=9D.
If you consider the 4th slide of the Facebook presentation, such i= nfrastructure decreases this possibility to only 1 tasktracker.
That means that if this tas= ktracker is very busy executing other tasks, you have the following choice:<= o:p>
-          Waiting this tasktracker to finish executing (part of) the current ta= sks (freeing map slots for instance)
-    &= nbsp;     Executing the map task for thi= s block in another tasktracker, transferring the information of the block th= rough the network
In both cases, you=C2=B4ll get a M/R penalty (please, tell me if I am wro= ng).
 
Has somebody considered such p= enalty or has some benchmarks to share with us?
 
One of the scenario I can think in order to take advantage of H= DFS RAID without suffering this penalty is:
-   =       &n= bsp;Using normal HDFS with d= efault replication=3D3 for my =E2=80=9Cfresh data=E2=80=9D=
-= &nb= sp;         Using HD= FS RAID for my historical data (that is barely used by M/R)
 
And you, what are you using HDFS RAID for?
 
Regards,
 
Sourygna Luangsay

= --Apple-Mail-86D17626-C9B9-428F-8603-C380D1CBDA10--