hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan A. P. Pendleton" ...@geekdom.net>
Subject Re: Best practice for in memory data?
Date Thu, 25 Jan 2007 19:26:54 GMT
There's also code floating around for a Multithreaded MapRunner. This (with
appropriate synchronization) would allow a shared HashMap without having to
pay the per-simultaneous-map overhead.

Another thing that might or might not make sense would be to use memcached
for your hashtable. This may or may not be an appropriate solution for your
problem, depending on scaling, your switches, and your access patterns. But,
it's probably worth a try. You should even be able to use the existing
bin/slaves.sh to launch a bunch of memcached instances quickly across your
cluster.

On 1/25/07, Doug Cutting <cutting@apache.org> wrote:
>
> Johan Oskarsson wrote:
> > Any advice on how to solve this problem?
>
> I think your current solutions sound reasonable.
>
> > Would it be possible to somehow share a hashmap between tasks?
>
> Not without running multiple tasks in the same JVM.  We could implement
> a mode where child tasks are run directly in the JobTracker's JVM, but
> that would not be good for reliability.  Alternately we could have
> spawned child processes execute multiple tasks, perhaps even in
> parallel, extending the TaskUmbilicalProtocol(), JobTracker, etc.  This
> would unfortunately not be a trivial modification.
>
> Doug
>


-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message