Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D64D191C7 for ; Wed, 20 Apr 2016 10:21:09 +0000 (UTC) Received: (qmail 75646 invoked by uid 500); 20 Apr 2016 10:21:09 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 75554 invoked by uid 500); 20 Apr 2016 10:21:09 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 75545 invoked by uid 99); 20 Apr 2016 10:21:09 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2016 10:21:09 +0000 Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id B2FEA1A0044 for ; Wed, 20 Apr 2016 10:21:08 +0000 (UTC) Received: by mail-ob0-f173.google.com with SMTP id tz8so29455828obc.0 for ; Wed, 20 Apr 2016 03:21:08 -0700 (PDT) X-Gm-Message-State: AOPr4FXp0EMged1JtoAOAaVtPFDtlazZYS6+hQkbfHr0FLaE8WocXTPvm6pFr0D9vS/Rd/uWUOCoF5cblQvyrNAX X-Received: by 10.182.60.198 with SMTP id j6mr3505441obr.12.1461147667973; Wed, 20 Apr 2016 03:21:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.44.239 with HTTP; Wed, 20 Apr 2016 03:20:28 -0700 (PDT) In-Reply-To: References: From: Ufuk Celebi Date: Wed, 20 Apr 2016 12:20:28 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Flink + S3 To: user@flink.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Apr 20, 2016 at 1:35 AM, Michael-Keith Bernard wrote: > We're running on self-managed EC2 instances (and we'll eventually have a = mirror cluster in our colo). The provided documentation notes that for Hado= op 2.6, we'd need such-and-such version of hadoop-aws and guice on the CP. = If I wanted to instead use Hadoop 2.7, which versions of those dependencies= should I get? And how can I look that up myself? The pom file for hadoop-a= ws[1] doesn't mention a specific dependency on Guice, so I'm curious how th= e author of that documentation knew exactly the dependencies and versions r= equired. Hey Michael-Keith, I think you meant Guava and not Guice. How to determine, which dependencies you need is quite a mess at the moment. It depends on a combination of 3 things: 1) the dependencies of hadoop-aws [1], 2) which S3 file system you use (in case of the docs org.apache.hadoop.fs.s3native.NativeS3FileSystem) [2], 3) what Flink shades away in its Hadoop dependencies [3] 1) hadoop-aws depends on hadoop-common (and other packages). hadoop-common is already part of Flink (including the fs.FileSystem classes etc.) 2) NativeS3FileSystem uses dependencies from hadoop-common like FileSystem and from hadoop-aws like Jets3tNativeFileSystemStore. The hadoop-common stuff is part of Flink and Jets3tNativeFileSystemStore is part of hadoop-aws. The big issue here is that other S3 FS implementations might work with the aws-java-sdk packages of hadoop-aws. 3) Flink shades Hadoop's Guava dependency away and that's why you need to add it manually to the CP. So, if you go for the suggested NativeS3FileSystem, you end up needing hadoop-aws and Guava. Of course, this might change in future versions of Flink and/or Hadoop. I will update the docs for the different versions of Flink and Hadoop for now and hope that this will help. :-( The easiest solution in the future would be that Flink comes with hadoop-aws, but I don't think that this is going to happen. =E2=80=93 Ufuk [1] http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.6.0 [2] https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.h= tml#provide-s3-filesystem-dependency [3] https://github.com/apache/flink/blob/master/flink-shaded-hadoop/pom.xml