hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: Sharing Memory across Map task [multiple cores] runing in same machineWe
Date Fri, 05 Sep 2008 12:13:19 GMT
Well a classical solution to that on Linux would be to mmap a cache file into 
multiple processes. No idea if you can do something like that with Java.

Andreas

On Friday 05 September 2008 10:28:37 Devaraj Das wrote:
> Hadoop doesn't support this natively. So if you need this kind of a
> functionality, you'd need to code your application in such a way. But I am
> worried about the race conditions in determining which task should first
> create the ramfs and load the data.
> If you can provide atomicity in determining whether the ramfs has been
> created and data loaded, and if not, then do the creation/load, then things
> should work.
> If atomicity cannot be guaranteed, you might consider this -
> 1) Run a job with only maps that creates the ramfs and loads the data (if
> your cluster is small you can do this manually). You can use distributed
> cache to store the data you want to load.
> 2) Run your job that processes the data
> 3) Run a third job to delete the ramfs.
>
> On 9/5/08 1:29 PM, "Amit Kumar Singh" <amitsingh@cse.iitb.ac.in> wrote:
> > Can we use something like RAM FS to share static data across map tasks.
> >
> > Scenario,
> > 1) Quadcore machine
> > 2) 2 1-TB Disk
> > 3) 8 GB ram,
> >
> > Now Ii need ~2.7 GB ram per Map process to load some static data in
> > memory using which i would be processing data.(cpu intensive jobs)
> >
> > Can i share memory across mappers on the same machine so that memory
> > footprint is less and i can run more than 4 mappers simultaneously
> > utilizing all 4 cores.
> >
> > Can we use stuff like RamFS



Mime
View raw message