hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin <kevin.macksa...@gmail.com>
Subject Re: run arbitrary job (non-MR) on YARN ?
Date Wed, 29 Oct 2014 21:38:30 GMT
You can accomplish this by using the DistributedShell application that
comes with YARN.

If you copy all your archives to HDFS, then inside your shell script you
could copy those archives to your YARN container and then execute whatever
you want, provided all the other system dependencies exist in the container
(correct Java version, Python, C++ libraries, etc.)

For example,

In myscript.sh I wrote the following:

#!/usr/bin/env bash
echo "This is my script running!"
echo "Present working directory:"
pwd
echo "Current directory listing: (nothing exciting yet)"
ls
echo "Copying file from HDFS to container"
hadoop fs -get /path/to/some/data/on/hdfs .
echo "Current directory listing: (file should not be here)"
ls
echo "Cat ExecScript.sh (this is the script created by the DistributedShell
application)"
cat ExecScript.sh

Run the DistributedShell application with the hadoop (or yarn) command:

hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.3.0-cdh5.1.3.jar
-num_containers 1 -shell_script myscript.sh

If you have the YARN log aggregation property set, then you can pipe the
container's logs to your client console using the yarn command:

yarn logs -applicationId application_1414160538995_0035

(replace the application id with yours)

Here is a quick reference that should help get you going:
http://books.google.com/books?id=heoXAwAAQBAJ&pg=PA227&lpg=PA227&dq=hadoop+yarn+distributed+shell+application&source=bl&ots=psGuJYlY1Y&sig=khp3b3hgzsZLZWFfz7GOe2yhgyY&hl=en&sa=X&ei=0U5RVKzDLeTK8gGgoYGoDQ&ved=0CFcQ6AEwCA#v=onepage&q&f=false

Hopefully this helps,
Kevin

On Mon Oct 27 2014 at 2:21:18 AM Yang <teddyyyy123@gmail.com> wrote:

> I happened to run into this interesting scenario:
>
> I had some mahout seq2sparse jobs, originally i run them in parallel using
> the distributed mode. but because the input files are so small, running
> them locally actually is much faster. so I truned them to local mode.
>
> but I run 10 of these jobs in parallel, so when 10 mahout jobs are run
> together, everyone became very slow.
>
> is there an existing code that takes a desired shell script, and possibly
> some archive files (could contain the jar file, or C++ --generated
> executable code). I understand that I could use yarn API to code such a
> thing, but it would be nice if I could just take it and run in shell..
>
> Thanks
> Yang
>

Mime
View raw message