samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <>
Subject Re: Deploying Samza Jobs Using S3 and YARN on AWS
Date Fri, 15 Sep 2017 18:44:54 GMT
Thank you Xiaochuan for your question!

You should ensure that *every machine in your cluster* has the S3 jar file
in its YARN class-path. From your error, it looks like the machine you are
running on does not have the JAR file corresponding to *S3AFileSystem*.

>> Whats the right way to set this up? Should I just copy over the required
AWS jars to the Hadoop conf directory

I'd lean on the side of simplicity and the *scp* route seems to address
most of your needs.

>> Should I be editing or

You should not have to edit any of these files. Once you fix your
class-paths by copying those relevant JARs, it should just work.

Please let us know if you need more assistance.


On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <> wrote:

> Hi,
> I'm trying to deploy a Samza job using YARN and S3 where I upload the zip
> package to S3 and point yarn.package.path to it.
> Does anyone know what kind of set up steps is required for this?
> What I've tried so far is to get Hello Samza to be run this way in AWS.
> However I ran into the following exception:
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> at org.apache.hadoop.fs.FileSystem.createFileSystem(
> at org.apache.hadoop.fs.FileSystem.access$200(
> ...
> Running "$YARN_HOME/bin/yarn classpath" gives the following:
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/share/hadoop/common/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/common/*
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/*
> /contrib/capacity-scheduler/*.jar
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> I manually copied the required AWS related jars to
> /home/ec2-user/deploy/yarn/share/hadoop/common.
> I checked that it is loadable by running "yarn
> org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not
> found" error instead of class not found.
> From the console output of I see the following in class path:
> 1. All jars under the lib directory of the zip package
> 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory)
> The class path from seem to be missing the AWS related jars
> required for S3AFileSystem.
> Whats the right way to set this up?
> Should I just copy over the required AWS jars to the Hadoop conf directory
> (2.)?
> Should I be editing or
> Thanks,
> Xiaochuan Yu

Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message