Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52C00DEB8 for ; Wed, 15 May 2013 16:57:55 +0000 (UTC) Received: (qmail 49652 invoked by uid 500); 15 May 2013 16:57:50 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 49342 invoked by uid 500); 15 May 2013 16:57:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49334 invoked by uid 99); 15 May 2013 16:57:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 16:57:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.87 as permitted sender) Received: from [65.55.111.87] (HELO blu0-omc2-s12.blu0.hotmail.com) (65.55.111.87) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 16:57:43 +0000 Received: from BLU0-SMTP271 ([65.55.111.73]) by blu0-omc2-s12.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 15 May 2013 09:57:22 -0700 X-EIP: [ax0SnhPPsqbBLMhIFWFvuOqujtwgW/dG] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [172.19.75.54] ([64.125.189.90]) by BLU0-SMTP271.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Wed, 15 May 2013 09:57:20 -0700 From: Michael Segel Content-Type: multipart/alternative; boundary="Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27" MIME-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: =?windows-1252?Q?Hardlinkes_=28See_HDFS-3370=29__wuz__Re=3A_Ques?= =?windows-1252?Q?tion_about_Name_Spaces=85?= Date: Wed, 15 May 2013 11:57:18 -0500 References: <943DB4E9-848E-456D-A30D-F9D5FF4EA291@yahoo.com> To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1503) X-OriginalArrivalTime: 15 May 2013 16:57:20.0962 (UTC) FILETIME=[43C62220:01CE518D] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Ok, that's what I thought. So here's my real question... I'm looking at HDFS-3370 (see: = https://issues.apache.org/jira/browse/HDFS-3370 ) There is some talk about one of the reasons why hardlinks haven't been = added was that it would be difficult to implement hardlinks across name = spaces.=20 It goes back to the comments made by Sanjay. In short, if what Lohit says is true, then when you replicate or use = HBase, the files will stay within the single namespace.=20 So there shouldn't be a reason to have hardlinks span namespaces.=20 (Or am I missing something? )=20 Is HDFS-3370 still active or is there another JIRA talking about = hardlinks on HDFS?=20 Thx -Mike On May 15, 2013, at 10:55 AM, lohit wrote: > Namespace is mainly for Namenode scalability. If someone copies file = to another namespace, then essentially they would be creating 6 copies = of same file.=20 > To achieve file name redundancy, it is better to have NameNode HA, = instead of copying it to another namespace. Since Datanodes serve blocks = to multiple namespace, locality is not an issue and copying file to = another namespace would not buy you much. =20 >=20 >=20 > 2013/5/15 Michael Segel > Well... >=20 > On the one hand, I'm trying to understand why one would break a = cluster in to multiple name spaces. > (Obviously this gets back to managing very large clusters.) >=20 > On the other. Why would someone want to have a copy of a file in two = different name spaces? >=20 > I'm making an assumption that when we have 3x replication that the = replicas don't cross name space boundaries. (Is this correct?) >=20 > My take is that one would copy a file to a second name space because = they want a physical copy in both name spaces for redundancy in case a = name space goes down. They would do this only for mission critical = files, or if the data is being shared by two different groups who want = their own copy of the data and they work solely within a single name = space. >=20 > The reason I am asking is that I'm trying to see how people view and = use namespaces. >=20 > Does that make sense? >=20 > Thx >=20 >=20 > On May 15, 2013, at 9:24 AM, Lohit wrote: >=20 > > > > > > On May 15, 2013, at 7:17 AM, Michael Segel = wrote: > > > >> Quick question... > >> So when we have a cluster which has multiple namespaces (multiple = name nodes) , why would you have a file in two different namespaces? > >> > > Are you saying why one would create same file in two namespace? Or = are you saying is there an option to have only one file but in two = namespace? > > > > Could you rephrase or give more information > >> > > >=20 >=20 >=20 >=20 > --=20 > Have a Nice Day! > Lohit --Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii" Ok, = that's what I thought.

So here's my real = question...

I'm looking at HDFS-3370 (see: https://issues.ap= ache.org/jira/browse/HDFS-3370 )

There is = some talk about one of the reasons why hardlinks haven't been added was = that it would be difficult to implement hardlinks across name = spaces. 
It goes back to the comments made by = Sanjay.

In short, if what Lohit says is true, = then when you replicate or use HBase, the files will stay within the = single namespace. 
So there shouldn't be a reason to have = hardlinks span namespaces. 

(Or am I = missing something? ) 

Is HDFS-3370 still = active or is there another JIRA talking about hardlinks on = HDFS? 

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lohit.vijayarenu@gmail.com&= gt; wrote:

Namespace is mainly for Namenode = scalability. If someone copies file to another namespace, then = essentially they would be creating 6 copies of same file. 
To = achieve file name redundancy, it is better to have NameNode HA, instead = of copying it to another namespace. Since Datanodes serve blocks to = multiple namespace, locality is not an issue and copying file to another = namespace would not buy you much.  


2013/5/15 = Michael Segel <michael_segel@hotmail.com>
Well...

On the one hand, I'm trying to understand why one would break a cluster = in to multiple name spaces.
(Obviously this gets back to managing very large clusters.)

On the other. Why would someone want to have a copy of a file in two = different name spaces?

I'm making an assumption that when we have 3x replication that the = replicas don't cross name space boundaries. (Is this correct?)

My take is that one would copy a file to a second name space because = they want a physical copy in both name spaces for redundancy in case a = name space goes down. They would do this only for mission critical = files, or if the data is being shared by two different groups who want = their own copy of the data and they work solely within a single name = space.

The reason I am asking is that I'm trying to see how people view and use = namespaces.

Does that make sense?

Thx


On May 15, 2013, at 9:24 AM, Lohit <lohit.vijayarenu@yahoo.com&= gt; wrote:

>
>
> On May 15, 2013, at 7:17 AM, Michael Segel <michael_segel@hotmail.com>= ; wrote:
>
>> Quick question...
>> So when we have a cluster which has multiple namespaces = (multiple name nodes) , why would you have a file in two different = namespaces?
>>
> Are you saying why one would create same file in two namespace? Or = are you saying is there an option to have only one file but in two = namespace?
>
> Could you rephrase or give more information
>>
>




-- =
Have a Nice Day!
Lohit

= --Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27--