Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com
 designates 65.55.111.87 as permitted sender)
Message-ID: <BLU0-SMTP2718305C276B09054A395168FA20@phx.gbl>
From: Michael Segel <michael_segel@hotmail.com>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27"
MIME-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: =?windows-1252?Q?Hardlinkes_=28See_HDFS-3370=29__wuz__Re=3A_Ques?=
 =?windows-1252?Q?tion_about_Name_Spaces=85?=
Date: Wed, 15 May 2013 11:57:18 -0500
References: <BLU0-SMTP31053F21FC455BDD0B390828FA20@phx.gbl>
 <943DB4E9-848E-456D-A30D-F9D5FF4EA291@yahoo.com>
 <BLU0-SMTP82C0CE3C2922C48EDD5C958FA20@phx.gbl>
 <CA+01ahiLDOA8tk1xCfNxd=3KGmv==VmDxpotO2HmXZkbR1SvdA@mail.gmail.com>
To: user@hadoop.apache.org
In-Reply-To: 
 <CA+01ahiLDOA8tk1xCfNxd=3KGmv==VmDxpotO2HmXZkbR1SvdA@mail.gmail.com>

--Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="us-ascii"

Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: =
https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been =
added was that it would be difficult to implement hardlinks across name =
spaces.=20
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use =
HBase, the files will stay within the single namespace.=20
So there shouldn't be a reason to have hardlinks span namespaces.=20

(Or am I missing something? )=20

Is HDFS-3370 still active or is there another JIRA talking about =
hardlinks on HDFS?=20

Thx

-Mike

On May 15, 2013, at 10:55 AM, lohit <lohit.vijayarenu@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file =
to another namespace, then essentially they would be creating 6 copies =
of same file.=20
> To achieve file name redundancy, it is better to have NameNode HA, =
instead of copying it to another namespace. Since Datanodes serve blocks =
to multiple namespace, locality is not an issue and copying file to =
another namespace would not buy you much. =20
>=20
>=20
> 2013/5/15 Michael Segel <michael_segel@hotmail.com>
> Well...
>=20
> On the one hand, I'm trying to understand why one would break a =
cluster in to multiple name spaces.
> (Obviously this gets back to managing very large clusters.)
>=20
> On the other. Why would someone want to have a copy of a file in two =
different name spaces?
>=20
> I'm making an assumption that when we have 3x replication that the =
replicas don't cross name space boundaries. (Is this correct?)
>=20
> My take is that one would copy a file to a second name space because =
they want a physical copy in both name spaces for redundancy in case a =
name space goes down. They would do this only for mission critical =
files, or if the data is being shared by two different groups who want =
their own copy of the data and they work solely within a single name =
space.
>=20
> The reason I am asking is that I'm trying to see how people view and =
use namespaces.
>=20
> Does that make sense?
>=20
> Thx
>=20
>=20
> On May 15, 2013, at 9:24 AM, Lohit <lohit.vijayarenu@yahoo.com> wrote:
>=20
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel =
<michael_segel@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple =
name nodes) , why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or =
are you saying is there an option to have only one file but in two =
namespace?
> >
> > Could you rephrase or give more information
> >>
> >
>=20
>=20
>=20
>=20
> --=20
> Have a Nice Day!
> Lohit


--Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Ok, =
that's what I thought.<div><br></div><div>So here's my real =
question...</div><div><br></div><div>I'm looking at HDFS-3370 (see: <a =
href=3D"https://issues.apache.org/jira/browse/HDFS-3370">https://issues.ap=
ache.org/jira/browse/HDFS-3370</a> )</div><div><br></div><div>There is =
some talk about one of the reasons why hardlinks haven't been added was =
that it would be difficult to implement hardlinks across name =
spaces.&nbsp;</div><div>It goes back to the comments made by =
Sanjay.</div><div><br></div><div>In short, if what Lohit says is true, =
then when you replicate or use HBase, the files will stay within the =
single namespace.&nbsp;</div><div>So there shouldn't be a reason to have =
hardlinks span namespaces.&nbsp;</div><div><br></div><div>(Or am I =
missing something? )&nbsp;</div><div><br></div><div>Is HDFS-3370 still =
active or is there another JIRA talking about hardlinks on =
HDFS?&nbsp;</div><div><br></div><div>Thx</div><div><br></div><div>-Mike</d=
iv><div><br><div><div><div>On May 15, 2013, at 10:55 AM, lohit &lt;<a =
href=3D"mailto:lohit.vijayarenu@gmail.com">lohit.vijayarenu@gmail.com</a>&=
gt; wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">Namespace is mainly for Namenode =
scalability. If someone copies file to another namespace, then =
essentially they would be creating 6 copies of same file.&nbsp;<div>To =
achieve file name redundancy, it is better to have NameNode HA, instead =
of copying it to another namespace. Since Datanodes serve blocks to =
multiple namespace, locality is not an issue and copying file to another =
namespace would not buy you much. &nbsp;<br>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/5/15 =
Michael Segel <span dir=3D"ltr">&lt;<a =
href=3D"mailto:michael_segel@hotmail.com" =
target=3D"_blank">michael_segel@hotmail.com</a>&gt;</span><br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
Well...<br>
<br>
On the one hand, I'm trying to understand why one would break a cluster =
in to multiple name spaces.<br>
(Obviously this gets back to managing very large clusters.)<br>
<br>
On the other. Why would someone want to have a copy of a file in two =
different name spaces?<br>
<br>
I'm making an assumption that when we have 3x replication that the =
replicas don't cross name space boundaries. (Is this correct?)<br>
<br>
My take is that one would copy a file to a second name space because =
they want a physical copy in both name spaces for redundancy in case a =
name space goes down. They would do this only for mission critical =
files, or if the data is being shared by two different groups who want =
their own copy of the data and they work solely within a single name =
space.<br>

<br>
The reason I am asking is that I'm trying to see how people view and use =
namespaces.<br>
<br>
Does that make sense?<br>
<br>
Thx<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
On May 15, 2013, at 9:24 AM, Lohit &lt;<a =
href=3D"mailto:lohit.vijayarenu@yahoo.com">lohit.vijayarenu@yahoo.com</a>&=
gt; wrote:<br>
<br>
&gt;<br>
&gt;<br>
&gt; On May 15, 2013, at 7:17 AM, Michael Segel &lt;<a =
href=3D"mailto:michael_segel@hotmail.com">michael_segel@hotmail.com</a>&gt=
; wrote:<br>
&gt;<br>
&gt;&gt; Quick question...<br>
&gt;&gt; So when we have a cluster which has multiple namespaces =
(multiple name nodes) , why would you have a file in two different =
namespaces?<br>
&gt;&gt;<br>
&gt; Are you saying why one would create same file in two namespace? Or =
are you saying is there an option to have only one file but in two =
namespace?<br>
&gt;<br>
&gt; Could you rephrase or give more information<br>
&gt;&gt;<br>
&gt;<br>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- =
<br>Have a Nice Day!<br>Lohit
</div></div></div>
</blockquote></div><br></div></div></body></html>=

--Apple-Mail=_1BB024EA-8A79-444B-A5F1-C612FB1A4F27--