hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Memory mapped resources
Date Tue, 12 Apr 2011 19:05:58 GMT
Actually, it doesn't become trivial.  It just becomes total fail or total
win instead of almost always being partial win.  It doesn't meet Benson's

On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> To get around the chunks or blocks problem, I've been implementing a
> system that simply sets a max block size that is too large for a file
> to reach.  In this way there will only be one block for HDFS file, and
> so MMap'ing or other single file ops become trivial.
> On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
> <bimargulies@gmail.com> wrote:
> > Here's the OP again.
> >
> > I want to make it clear that my question here has to do with the
> > problem of distributing 'the program' around the cluster, not 'the
> > data'. In the case at hand, the issue a system that has a large data
> > resource that it needs to do its work. Every instance of the code
> > needs the entire model. Not just some blocks or pieces.
> >
> > Memory mapping is a very attractive tactic for this kind of data
> > resource. The data is read-only. Memory-mapping it allows the
> > operating system to ensure that only one copy of the thing ends up in
> > physical memory.
> >
> > If we force the model into a conventional file (storable in HDFS) and
> > read it into the JVM in a conventional way, then we get as many copies
> > in memory as we have JVMs.  On a big machine with a lot of cores, this
> > begins to add up.
> >
> > For people who are running a cluster of relatively conventional
> > systems, just putting copies on all the nodes in a conventional place
> > is adequate.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message