hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Coordination between Mapper tasks
Date Fri, 20 Mar 2009 14:06:32 GMT
Stuart White wrote:
> The nodes in my cluster have 4 cores & 4 GB RAM.  So, I've set
> mapred.tasktracker.map.tasks.maximum to 3 (leaving 1 core for
> "breathing room").
> 
> My process requires a large dictionary of terms (~ 2GB when loaded
> into RAM).  The terms are looked-up very frequently, so I want the
> terms memory-resident.
> 
> So, the problem is, I want 3 processes (to utilize CPU), but each
> process requires ~2GB, but my nodes don't have enough memory to each
> have their own copy of the 2GB of data.  So, I need to somehow share
> the 2GB between the processes.
> 
> What I have currently implemented is a standalone RMI service that,
> during startup, loads the 2GB dictionaries.  My mappers are simply RMI
> clients that call this RMI service.
> 
> This works just fine.  The only problem is that my standalone RMI
> service is totally "outside" Hadoop.  I have to ssh onto each of the
> nodes, start/stop/reconfigure the services manually, etc...

There's nothing wrong with doing this outside Hadoop, the only problem 
is that manual deployment is not the way forward.

1. some kind of javaspace system where you put facts into the t-space 
and let them all share it

2. (CofI warning), use something like SmartFrog's anubis tuplespace to 
bring up one -and one only- node with the dictionary application. This 
may be hard to get started, but it keeps availability high -the anubis 
nodes keep track of all other members of the cluster by some 
heartbeat/election protocol, and can handle failures of the dictionary 
node by automatically bringing up a new one

3. Roll your own multicast/voting protocol, so avoiding RMI. Something 
scatter/gather style is needed as part of the Apache Cloud computing 
product portfolio, so you could try implementing it -Doug Cutting will 
probably provide constructive feedback.

I haven't played with zookeeper enough to say whether it would work here

-steve

Mime
View raw message