hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject Child JVM, Distributed Cache and Language Embedding
Date Wed, 13 Feb 2013 05:28:02 GMT

I'm bit fuzzy on the details here so appreciate your help.

I am embedding a language into the JVM. My hadoop job will instantiate the
child JVM once for all tasks assigned (mapred.job.reuse.jvm.num.tasks = -1)

So if a node can run 6 parallel JVMs, it will and these 6 will churn
through all the tasks assigned to them.

Now, per JVM, the language engine will be instantiated. For this to work, I
will ship the language distribution to the nodes (the nodes are really bare
and installing the language on the node is not an option) using the
distributed cache (as a tar.gz. file).

My understanding is that HadoopMapreduce will unarchive this tgz file and
then for every task attempt symlink it into the task attempt's working

However, for the language engine  to be successfully initialized i need to
know the location of the unarchived file, a location that will stay
constant across all task attempts for that child JVM,

Q: How can i infer this location?


View raw message