hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject Re: Child JVM, Distributed Cache and Language Embedding
Date Wed, 13 Feb 2013 06:55:55 GMT

On Tue, Feb 12, 2013 at 9:28 PM, Saptarshi Guha <saptarshi.guha@gmail.com>wrote:

> Hello,
> I'm bit fuzzy on the details here so appreciate your help.
> I am embedding a language into the JVM. My hadoop job will instantiate the
> child JVM once for all tasks assigned (mapred.job.reuse.jvm.num.tasks =
> -1)
> So if a node can run 6 parallel JVMs, it will and these 6 will churn
> through all the tasks assigned to them.
> Now, per JVM, the language engine will be instantiated. For this to work,
> I will ship the language distribution to the nodes (the nodes are really
> bare and installing the language on the node is not an option) using the
> distributed cache (as a tar.gz. file).
> My understanding is that HadoopMapreduce will unarchive this tgz file and
> then for every task attempt symlink it into the task attempt's working
> folder.
> However, for the language engine  to be successfully initialized i need to
> know the location of the unarchived file, a location that will stay
> constant across all task attempts for that child JVM,
> Q: How can i infer this location?
> Cheers
> Saptarshi

View raw message