Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D750178F9 for ; Thu, 30 Oct 2014 22:07:40 +0000 (UTC) Received: (qmail 82069 invoked by uid 500); 30 Oct 2014 22:07:35 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 81852 invoked by uid 500); 30 Oct 2014 22:07:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 81810 invoked by uid 99); 30 Oct 2014 22:07:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Oct 2014 22:07:34 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.192.179 as permitted sender) Received: from [209.85.192.179] (HELO mail-pd0-f179.google.com) (209.85.192.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Oct 2014 22:07:08 +0000 Received: by mail-pd0-f179.google.com with SMTP id g10so5913517pdj.24 for ; Thu, 30 Oct 2014 15:04:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=16qYawJYtlbPwpvhVq3ik5fFaUPEWv5skBuqFm/8EZQ=; b=SKx9EED0pm88YKhAbqeB+ANPyXx4RTVSooVcn6r4p2YhH8FSGdaqzKbh6WCuCCVmy6 VNXxyobyZZMDvBQL1fWRNdlin1hSRRZvwa/57o9E630MuCnmIbTuik9a7xvDfIxfDVzV dVrmARaDUKQToENMGPYxZ6tIso/WGD2vB6lhPm/bfabSup7BcyIYYVvIfOMBDr+K/Yw/ 7G+SwlFhtCo4tGRgnVuPvUZdqJ9WoSYCpwHWQe3CXhlFgJFYQu1Ca62jDx8DLof7NXGK I/FxMZurvy3oL5uXMl4Ywo/F3nTKwmuTLrGpWWY67T6Vz6mDPD8bEee0v5z2fkl+bi+a ldgQ== X-Received: by 10.66.90.129 with SMTP id bw1mr20069086pab.61.1414706691771; Thu, 30 Oct 2014 15:04:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.48.8 with HTTP; Thu, 30 Oct 2014 15:04:31 -0700 (PDT) In-Reply-To: References: From: Yang Date: Thu, 30 Oct 2014 15:04:31 -0700 Message-ID: Subject: Re: run arbitrary job (non-MR) on YARN ? To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2c220d3211c0506ab1133 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c220d3211c0506ab1133 Content-Type: text/plain; charset=UTF-8 thanks! On Wed, Oct 29, 2014 at 2:38 PM, Kevin wrote: > You can accomplish this by using the DistributedShell application that > comes with YARN. > > If you copy all your archives to HDFS, then inside your shell script you > could copy those archives to your YARN container and then execute whatever > you want, provided all the other system dependencies exist in the container > (correct Java version, Python, C++ libraries, etc.) > > For example, > > In myscript.sh I wrote the following: > > #!/usr/bin/env bash > echo "This is my script running!" > echo "Present working directory:" > pwd > echo "Current directory listing: (nothing exciting yet)" > ls > echo "Copying file from HDFS to container" > hadoop fs -get /path/to/some/data/on/hdfs . > echo "Current directory listing: (file should not be here)" > ls > echo "Cat ExecScript.sh (this is the script created by the > DistributedShell application)" > cat ExecScript.sh > > Run the DistributedShell application with the hadoop (or yarn) command: > > hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar > /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.3.0-cdh5.1.3.jar > -num_containers 1 -shell_script myscript.sh > > If you have the YARN log aggregation property set, then you can pipe the > container's logs to your client console using the yarn command: > > yarn logs -applicationId application_1414160538995_0035 > > (replace the application id with yours) > > Here is a quick reference that should help get you going: > > http://books.google.com/books?id=heoXAwAAQBAJ&pg=PA227&lpg=PA227&dq=hadoop+yarn+distributed+shell+application&source=bl&ots=psGuJYlY1Y&sig=khp3b3hgzsZLZWFfz7GOe2yhgyY&hl=en&sa=X&ei=0U5RVKzDLeTK8gGgoYGoDQ&ved=0CFcQ6AEwCA#v=onepage&q&f=false > > Hopefully this helps, > Kevin > > On Mon Oct 27 2014 at 2:21:18 AM Yang wrote: > >> I happened to run into this interesting scenario: >> >> I had some mahout seq2sparse jobs, originally i run them in parallel >> using the distributed mode. but because the input files are so small, >> running them locally actually is much faster. so I truned them to local >> mode. >> >> but I run 10 of these jobs in parallel, so when 10 mahout jobs are run >> together, everyone became very slow. >> >> is there an existing code that takes a desired shell script, and possibly >> some archive files (could contain the jar file, or C++ --generated >> executable code). I understand that I could use yarn API to code such a >> thing, but it would be nice if I could just take it and run in shell.. >> >> Thanks >> Yang >> > --001a11c2c220d3211c0506ab1133 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
thanks!

On Wed, Oct 29, 2014 at 2:38 PM, Kevin <<= a href=3D"mailto:kevin.macksamie@gmail.com" target=3D"_blank">kevin.macksam= ie@gmail.com> wrote:
Y= ou can accomplish this by using the DistributedShell application that comes= with YARN.

If you copy all your archives to HDFS,= then inside your shell script you could copy those archives to your YARN c= ontainer and then execute whatever you want, provided all the other system = dependencies exist in the container (correct Java version, Python, C++ libr= aries, etc.)

For example,

In myscript.sh I wrote the following:

#!/usr= /bin/env bash
echo "This is my script running!"
echo "Present working directory:"
pwd
echo= "Current directory listing: (nothing exciting yet)"
ls=
echo "Copying file from HDFS to container"
h= adoop fs -get /path/to/some/data/on/hdfs .
echo "Current dir= ectory listing: (file should not be here)"
ls
echo= "Cat ExecScript.sh (this is the script created by the DistributedShel= l application)"
cat ExecScript.sh

=
Run the DistributedShell application with the hadoop (or yarn) command= :

hadoop org.apache.hadoop.yarn.applications.distr= ibutedshell.Client -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distr= ibutedshell-2.3.0-cdh5.1.3.jar -num_containers 1 -shell_script myscript.sh<= br>

If you have the YARN log aggregation property = set, then you can pipe the container's logs to your client console usin= g the yarn command:

yarn logs -applicationId appli= cation_1414160538995_0035

(replace the application = id with yours)

Here is a quick reference that = should help get you going:

Hopefully this helps,
Kevin

On Mon Oct 27 2014 at 2:21:18 AM Yang <teddyyyy123@gmail.com> wrote:
I happened to run into this interes= ting scenario:

I had some mahout seq2sparse jobs, origin= ally i run them in parallel using the distributed mode. but because the inp= ut files are so small, running them locally actually is much faster. so I t= runed them to local mode.

but I run 10 of these jo= bs in parallel, so when 10 mahout jobs are run together, everyone became ve= ry slow.

is there an existing code that takes a de= sired shell script, and possibly some archive files (could contain the jar = file, or C++ --generated executable code). I understand that I could use ya= rn API to code such a thing, but it would be nice if I could just take it a= nd run in shell..

Thanks
Yang

--001a11c2c220d3211c0506ab1133--