Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 47558200D17 for ; Sun, 24 Sep 2017 00:24:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 40CC41609D3; Sat, 23 Sep 2017 22:24:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5E13C1609B8 for ; Sun, 24 Sep 2017 00:24:32 +0200 (CEST) Received: (qmail 15514 invoked by uid 500); 23 Sep 2017 22:24:31 -0000 Mailing-List: contact dev-help@samza.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.apache.org Delivered-To: mailing list dev@samza.apache.org Received: (qmail 15498 invoked by uid 99); 23 Sep 2017 22:24:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Sep 2017 22:24:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C6E92180B86 for ; Sat, 23 Sep 2017 22:24:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=kik.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DlGZi4mx9nso for ; Sat, 23 Sep 2017 22:24:24 +0000 (UTC) Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B67A45FAC9 for ; Sat, 23 Sep 2017 22:24:23 +0000 (UTC) Received: by mail-vk0-f50.google.com with SMTP id 126so2077356vkj.9 for ; Sat, 23 Sep 2017 15:24:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kik.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=RjjYEypo6GfCIso18I3lXO7IAky+fByiLnsD0KS0Bvg=; b=J+Ilcl+asbuJRacJMqttp7ThRY9FYRsQcGfpp3mxaXK43OpLD/zrRa40uXrG+HsNWN RxZ0ARF2o6Z0U6EhBrD3z4DET+EVGwTc63Ovf2DgvsNZdjQfhKOi8ooJXfAaBAivIesT qUq9erWq0eDr8suTIp0NNNxzF5K2LNAgF9o/8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=RjjYEypo6GfCIso18I3lXO7IAky+fByiLnsD0KS0Bvg=; b=GVsV0rGB/XkL1Md7bOW/AC3ytTAticabFG3JIlQaJOelUEIYrd7gZexOPgBEshfxB0 J070hzJ992qKooe7TKQC2Po8xwheMe+TZuQgMmcfMnieRHMEJCpm+ItBJWLSSxSlpTxF 41NIvSIH/hjuVX4zftNgC/gA3F3fAmbX/1rYFboTGSWw3P56EtSrivxk5/aDioa48d8l xKPy/eAFIGyprkjoSFQtuRHxA9xfbyEYtE10b50X52CeQ/7bjpFB4loJRuM7LYqH6Wo1 rCdvJme1uVt/0k2uS6id5pwLQSz9onxPSGLzjdqpcmZUIZcZxul4XgH93igh8UGNpVDt w2ZQ== X-Gm-Message-State: AHPjjUjI/4Y0LPPGUSXHINVW55lFUIyQ1atFRixuZMDHXM0qBeMBw8zu 5KTK0jzVHqDFuuKl9FyyUxunP8EMr5x2HYESj+YHa2WO X-Google-Smtp-Source: AOwi7QCusU4e45Xv1L+taETyS5kEU4m8DH4ClWindtNUo567yNfSIfMyheUWOOEZ9te3A/7e9MDuoBJI8bt+sNILJ7U= X-Received: by 10.31.161.216 with SMTP id k207mr3033289vke.96.1506205457133; Sat, 23 Sep 2017 15:24:17 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: XiaoChuan Yu Date: Sat, 23 Sep 2017 22:24:06 +0000 Message-ID: Subject: Re: Deploying Samza Jobs Using S3 and YARN on AWS To: dev@samza.apache.org Content-Type: multipart/alternative; boundary="001a1143f2223b49390559e2c971" archived-at: Sat, 23 Sep 2017 22:24:33 -0000 --001a1143f2223b49390559e2c971 Content-Type: text/plain; charset="UTF-8" I found out that it was necessary to include "hadoop-aws" as a part of the package submitted to YARN similar to the instructions for deploying from HDFS . However, due to a dependency conflict on the AWS SDK between our code and "hadoop-aws", we can't actually include it. We are now planning to make use of HTTP FS instead. On Fri, Sep 15, 2017 at 2:45 PM Jagadish Venkatraman wrote: > Thank you Xiaochuan for your question! > > You should ensure that *every machine in your cluster* has the S3 jar file > in its YARN class-path. From your error, it looks like the machine you are > running on does not have the JAR file corresponding to *S3AFileSystem*. > > >> Whats the right way to set this up? Should I just copy over the required > AWS jars to the Hadoop conf directory > > I'd lean on the side of simplicity and the *scp* route seems to address > most of your needs. > > >> Should I be editing run-job.sh or run-class.sh? > > You should not have to edit any of these files. Once you fix your > class-paths by copying those relevant JARs, it should just work. > > Please let us know if you need more assistance. > > -- > Jagdish > > > On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu > wrote: > > > Hi, > > > > I'm trying to deploy a Samza job using YARN and S3 where I upload the zip > > package to S3 and point yarn.package.path to it. > > Does anyone know what kind of set up steps is required for this? > > > > What I've tried so far is to get Hello Samza to be run this way in AWS. > > > > However I ran into the following exception: > > Exception in thread "main" java.lang.RuntimeException: > > java.lang.ClassNotFoundException: Class > > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) > > at org.apache.hadoop.fs.FileSystem.getFileSystemClass( > > FileSystem.java:2578) > > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > > ... > > > > Running "$YARN_HOME/bin/yarn classpath" gives the following: > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/share/hadoop/common/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/common/* > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/* > > /contrib/capacity-scheduler/*.jar > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > > > > I manually copied the required AWS related jars to > > /home/ec2-user/deploy/yarn/share/hadoop/common. > > I checked that it is loadable by running "yarn > > org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not > > found" error instead of class not found. > > > > From the console output of run-job.sh I see the following in class path: > > 1. All jars under the lib directory of the zip package > > 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory) > > > > The class path from run-job.sh seem to be missing the AWS related jars > > required for S3AFileSystem. > > Whats the right way to set this up? > > Should I just copy over the required AWS jars to the Hadoop conf > directory > > (2.)? > > Should I be editing run-job.sh or run-class.sh? > > > > Thanks, > > Xiaochuan Yu > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University > --001a1143f2223b49390559e2c971--