hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Hardlinkes (See HDFS-3370) wuz Re: Question about Name Spaces…
Date Wed, 15 May 2013 16:57:18 GMT
Ok, that's what I thought.

So here's my real question...

I'm looking at HDFS-3370 (see: https://issues.apache.org/jira/browse/HDFS-3370 )

There is some talk about one of the reasons why hardlinks haven't been added was that it would
be difficult to implement hardlinks across name spaces. 
It goes back to the comments made by Sanjay.

In short, if what Lohit says is true, then when you replicate or use HBase, the files will
stay within the single namespace. 
So there shouldn't be a reason to have hardlinks span namespaces. 

(Or am I missing something? ) 

Is HDFS-3370 still active or is there another JIRA talking about hardlinks on HDFS? 



On May 15, 2013, at 10:55 AM, lohit <lohit.vijayarenu@gmail.com> wrote:

> Namespace is mainly for Namenode scalability. If someone copies file to another namespace,
then essentially they would be creating 6 copies of same file. 
> To achieve file name redundancy, it is better to have NameNode HA, instead of copying
it to another namespace. Since Datanodes serve blocks to multiple namespace, locality is not
an issue and copying file to another namespace would not buy you much.  
> 2013/5/15 Michael Segel <michael_segel@hotmail.com>
> Well...
> On the one hand, I'm trying to understand why one would break a cluster in to multiple
name spaces.
> (Obviously this gets back to managing very large clusters.)
> On the other. Why would someone want to have a copy of a file in two different name spaces?
> I'm making an assumption that when we have 3x replication that the replicas don't cross
name space boundaries. (Is this correct?)
> My take is that one would copy a file to a second name space because they want a physical
copy in both name spaces for redundancy in case a name space goes down. They would do this
only for mission critical files, or if the data is being shared by two different groups who
want their own copy of the data and they work solely within a single name space.
> The reason I am asking is that I'm trying to see how people view and use namespaces.
> Does that make sense?
> Thx
> On May 15, 2013, at 9:24 AM, Lohit <lohit.vijayarenu@yahoo.com> wrote:
> >
> >
> > On May 15, 2013, at 7:17 AM, Michael Segel <michael_segel@hotmail.com> wrote:
> >
> >> Quick question...
> >> So when we have a cluster which has multiple namespaces (multiple name nodes)
, why would you have a file in two different namespaces?
> >>
> > Are you saying why one would create same file in two namespace? Or are you saying
is there an option to have only one file but in two namespace?
> >
> > Could you rephrase or give more information
> >>
> >
> -- 
> Have a Nice Day!
> Lohit

View raw message