hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Kevin.Le...@thomsonreuters.com>
Subject RE: Memory mapped resources
Date Tue, 12 Apr 2011 12:51:46 GMT
This is the age old argument of what to share in a partitioned
environment. IBM and Teradata have always used "shared nothing" which is
what only getting one chunk of the file in each hadoop node is doing.
Oracle has always used "shared disk" which is not an easy thing to do,
especially in scale, and seems to have varying results depending on
application, transaction or dss. Here are a couple of web references.



Rather than say shared nothing isn't useful, hadoop should look to how
others make this work. The two key problems to avoid are data skew where
one node sees to much data and becomes the slow node and large
intra-partition joins where large data is needed from more than one
partition and potentially gets copied around.

Rather than hybriding into shared disk, I think hadoop should hybrid
into the shared data solutions others use, replication of select data,
for solving intra-partition joins in a "shared nothing" architecture.
This may be more database terminology that could be addressed by hbase,
but I think it is good background for the questions of memory mapping
files in hadoop.


-----Original Message-----
From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: Tuesday, April 12, 2011 12:09 AM
To: Jason Rutherglen
Cc: common-user@hadoop.apache.org; Edward Capriolo
Subject: Re: Memory mapped resources

Yes.  But only one such block. That is what I meant by chunk.

That is fine if you want that chunk but if you want to mmap the entire
it isn't real useful.

On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> What do you mean by local chunk?  I think it's providing access to the
> underlying file block?
> On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning <tdunning@maprtech.com>
> wrote:
> > Also, it only provides access to a local chunk of a file which isn't
> > useful.
> >
> > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo
> > wrote:
> >>
> >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
> >> <jason.rutherglen@gmail.com> wrote:
> >> > Yes you can however it will require customization of HDFS.  Take
> >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt
> >> >  I have been altering it for use with HBASE-3529.  Note that the
> >> > noted is for the -append branch which is mainly for HBase.
> >> >
> >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies
> >> > <bimargulies@gmail.com> wrote:
> >> >> We have some very large files that we access via memory mapping
> >> >> Java. Someone's asked us about how to make this conveniently
> >> >> deployable in Hadoop. If we tell them to put the files into
hdfs, can
> >> >> we obtain a File for the underlying file on any given node?
> >> >>
> >> >
> >>
> >> This features it not yet part of hadoop so doing this is not
> "convenient".
> >
> >

View raw message