flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KOSTIANTYN Kudriavtsev <kudryavtsev.konstan...@gmail.com>
Subject Re: Processing S3 data with Apache Flink
Date Tue, 06 Oct 2015 18:44:40 GMT
Hi Robert,

you are right, I just misspell name of the file :(  Everything works fine!

Basically, I'd suggest to move this workaround into official doc and mark
custom S3FileSystem as @Deprecated...
In fact, I like that idea to mark all untested functional with specific
annotation, for example @Beta. Just because of a big enterprises won't be
like to use any product where documented features don't work. For example,
for me it would be difficult to advocate Flink usage on the project as far
as S3FileSystem was broken and my opponents will refer to that "who knows
what's broken". If some functionality is marked as not properly tested,
it's much easier to make decisions because of better visibility


Thank you,
Konstantin Kudryavtsev

On Tue, Oct 6, 2015 at 2:12 PM, Robert Metzger <rmetzger@apache.org> wrote:

> Mh. I tried out the code I've posted yesterday and it was working
> immediately.
> The security settings of AWS are sometimes a bit complicated.
> I think there are some logs for S3 buckets, maybe they contain some more
> information.
> Maybe there are other users facing the same issue. Since the S3FileSystem
> class is from Hadoop, I suspect the code to be widely used, and you can
> probably find answers to the most common problems on google.
> On Tue, Oct 6, 2015 at 1:07 PM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstantin@gmail.com> wrote:
>> Hi Robert,
>> thank you very much for your input!
>> Have you tried that?
>> With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward,
>> and now got a new exception:
>> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
>> for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden
>> it's really strange as far as I gave full permissions
>> to authenticated users and can get target file from s3cmd or s3 browser
>> from the same PC... I realize that it's question not to you, but perhaps
>> you have faced the same issue
>> Thanks in advance!
>> Kostia
>> Thank you,
>> Konstantin Kudryavtsev
>> On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger <rmetzger@apache.org>
>> wrote:
>>> Hi Kostia,
>>> thank you for writing to the Flink mailing list. I actually started to
>>> try out our S3 File system support after I saw your question on
>>> StackOverflow [1].
>>> I found that our S3 connector is very broken. I had to resolve two more
>>> issues with it, before I was able to get the same exception you reported.
>>> Another Flink commiter looked into the issue as well (it was confirmed
>>> as well) but there was no solution [2].
>>> So for now, I would say we have to assume that our S3 connector is not
>>> working. I will start a separate discussion at the developer mailing list
>>> to remove our S3 connector.
>>> The good news is that you can just use Hadoop's S3 File System
>>> implementation with Flink.
>>> I used this Flink program to verify its working:
>>> public class S3FileSystem {
>>>    public static void main(String[] args) throws Exception {
>>>       ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
>>>       DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>>>       myLines.print();
>>>    }
>>> }
>>> also, you need to make a Hadoop configuration file available to Flink.
>>> When running flink locally in your IDE, just create a "core-site.xml" in
>>> the src/main/resource folder, with the following content:
>>> <configuration>
>>>     <property>
>>>         <name>fs.s3n.awsAccessKeyId</name>
>>>         <value>putKeyHere</value>
>>>     </property>
>>>     <property>
>>>         <name>fs.s3n.awsSecretAccessKey</name>
>>>         <value>putSecretHere</value>
>>>     </property>
>>>     <property>
>>>         <name>fs.s3n.impl</name>
>>>         <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     </property>
>>> </configuration>
>>> Maybe you are running on a cluster, then re-use the existing
>>> core-site.xml file (= edit it) and point to the directory using Flink's
>>> fs.hdfs.hadoopconf configuration option.
>>> With these two things in place, you should be good to go.
>>> [1]
>>> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
>>> [2]
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>>> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
>>> kudryavtsev.konstantin@gmail.com> wrote:
>>>> Hi guys,
>>>> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
>>>> data from S3. I tried the following path for data
>>>> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
>>>> exception:
>>>> java.io.IOException: Cannot establish connection to Amazon S3:
>>>> com.amazonaws.services.s3.model.AmazonS3Exception: The request
>>>> signature
>>>> we calculated does not match the signature you provided. Check your key
>>>> and signing method. (Service: Amazon S3; Status Code: 403;
>>>> I added access and secret keys, so the problem is not here. I=92m using
>>>> standard region and gave read credential to everyone.
>>>> Any ideas how can it be fixed?
>>>> Thank you in advance,
>>>> Kostia

View raw message