From user-return-192-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Thu Aug 9 10:35:02 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D02FDE47 for ; Thu, 9 Aug 2012 10:35:02 +0000 (UTC) Received: (qmail 29738 invoked by uid 500); 9 Aug 2012 10:34:57 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 29471 invoked by uid 500); 9 Aug 2012 10:34:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 29443 invoked by uid 99); 9 Aug 2012 10:34:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 10:34:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.86 as permitted sender) Received: from [65.55.111.86] (HELO blu0-omc2-s11.blu0.hotmail.com) (65.55.111.86) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 10:34:48 +0000 Received: from BLU0-SMTP126 ([65.55.111.73]) by blu0-omc2-s11.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 9 Aug 2012 03:34:27 -0700 X-Originating-IP: [173.15.87.38] X-EIP: [mQrYN8ebdrEL4rdwPpEEGlK08y0fEa/c] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [10.1.10.10] ([173.15.87.38]) by BLU0-SMTP126.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 9 Aug 2012 03:34:25 -0700 Content-Type: multipart/alternative; boundary="Apple-Mail=_2516873D-D0CB-49E0-AC75-886C45C2702D" MIME-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: is HDFS RAID "data locality" efficient? From: Michael Segel In-Reply-To: Date: Thu, 9 Aug 2012 05:34:24 -0500 CC: Steve Loughran References: <00bc01cd7585$4c585550$e508fff0$@com> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1485) X-OriginalArrivalTime: 09 Aug 2012 10:34:26.0036 (UTC) FILETIME=[8C666340:01CD761A] --Apple-Mail=_2516873D-D0CB-49E0-AC75-886C45C2702D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="windows-1252" Ok...=20 So under Apache Hadoop, how do you specify the location of when and = where a directory will be created on HDFS?=20 As an example, if I want to create a /coldData directory in HDFS as a = place to store my older data sets, How does that get assigned = specifically to a RAIDed HDFS? (Or even specific machines?)=20 I know I can do this in MapR's distribution, but I am not aware of this = feature being made available in the Apache based releases?=20 Is this part of the latest feature set?=20 Thx -Mike On Aug 8, 2012, at 12:31 PM, Steve Loughran = wrote: >=20 >=20 > On 8 August 2012 09:46, Sourygna Luangsay = wrote: > Hi folks! >=20 > One of the scenario I can think in order to take advantage of HDFS = RAID without suffering this penalty is: >=20 > - Using normal HDFS with default replication=3D3 for my = =93fresh data=94 >=20 > - Using HDFS RAID for my historical data (that is barely used = by M/R) >=20 > =20 >=20 >=20 >=20 >=20 > exactly: less space use on cold data, with the penalty that access = performance can be worse. As the majority of data on a hadoop cluster is = usually "cold", it's a space and power efficient story for the archive = data >=20 > --=20 > Steve Loughran > Hortonworks Inc >=20 --Apple-Mail=_2516873D-D0CB-49E0-AC75-886C45C2702D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="windows-1252" stevel@hortonworks.com> = wrote:


On 8 August 2012 09:46, = Sourygna Luangsay <sluangsay@pragsis.com> = wrote:

Hi folks!

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty = is:

-          Using normal HDFS with = default replication=3D3 for my =93fresh = data=94

-          Using HDFS RAID for my historical data (that is barely used by M/R)

 



=
exactly: less space use on cold data, with the penalty that access = performance can be worse. As the majority of data on a hadoop cluster is = usually "cold", it's a space and power efficient story for the archive = data

--
Steve Loughran
Hortonworks = Inc


= --Apple-Mail=_2516873D-D0CB-49E0-AC75-886C45C2702D--