hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Memory mapped resources
Date Tue, 12 Apr 2011 17:40:09 GMT
Here's the OP again.

I want to make it clear that my question here has to do with the
problem of distributing 'the program' around the cluster, not 'the
data'. In the case at hand, the issue a system that has a large data
resource that it needs to do its work. Every instance of the code
needs the entire model. Not just some blocks or pieces.

Memory mapping is a very attractive tactic for this kind of data
resource. The data is read-only. Memory-mapping it allows the
operating system to ensure that only one copy of the thing ends up in
physical memory.

If we force the model into a conventional file (storable in HDFS) and
read it into the JVM in a conventional way, then we get as many copies
in memory as we have JVMs.  On a big machine with a lot of cores, this
begins to add up.

For people who are running a cluster of relatively conventional
systems, just putting copies on all the nodes in a conventional place
is adequate.

View raw message