flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Processing S3 data with Apache Flink
Date Tue, 06 Oct 2015 18:12:40 GMT
Mh. I tried out the code I've posted yesterday and it was working
immediately.
The security settings of AWS are sometimes a bit complicated.
I think there are some logs for S3 buckets, maybe they contain some more
information.

Maybe there are other users facing the same issue. Since the S3FileSystem
class is from Hadoop, I suspect the code to be widely used, and you can
probably find answers to the most common problems on google.


On Tue, Oct 6, 2015 at 1:07 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstantin@gmail.com> wrote:

> Hi Robert,
>
> thank you very much for your input!
>
> Have you tried that?
> With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward,
> and now got a new exception:
>
>
> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
> for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden
>
> it's really strange as far as I gave full permissions
> to authenticated users and can get target file from s3cmd or s3 browser
> from the same PC... I realize that it's question not to you, but perhaps
> you have faced the same issue
>
> Thanks in advance!
> Kostia
>
> Thank you,
> Konstantin Kudryavtsev
>
> On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger <rmetzger@apache.org>
> wrote:
>
>> Hi Kostia,
>>
>> thank you for writing to the Flink mailing list. I actually started to
>> try out our S3 File system support after I saw your question on
>> StackOverflow [1].
>> I found that our S3 connector is very broken. I had to resolve two more
>> issues with it, before I was able to get the same exception you reported.
>>
>> Another Flink commiter looked into the issue as well (it was confirmed as
>> well) but there was no solution [2].
>>
>> So for now, I would say we have to assume that our S3 connector is not
>> working. I will start a separate discussion at the developer mailing list
>> to remove our S3 connector.
>>
>> The good news is that you can just use Hadoop's S3 File System
>> implementation with Flink.
>>
>> I used this Flink program to verify its working:
>>
>> public class S3FileSystem {
>>    public static void main(String[] args) throws Exception {
>>       ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
>>       DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>>       myLines.print();
>>    }
>> }
>>
>> also, you need to make a Hadoop configuration file available to Flink.
>> When running flink locally in your IDE, just create a "core-site.xml" in
>> the src/main/resource folder, with the following content:
>>
>> <configuration>
>>
>>     <property>
>>         <name>fs.s3n.awsAccessKeyId</name>
>>         <value>putKeyHere</value>
>>     </property>
>>
>>     <property>
>>         <name>fs.s3n.awsSecretAccessKey</name>
>>         <value>putSecretHere</value>
>>     </property>
>>     <property>
>>         <name>fs.s3n.impl</name>
>>         <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     </property>
>> </configuration>
>>
>> Maybe you are running on a cluster, then re-use the existing
>> core-site.xml file (= edit it) and point to the directory using Flink's
>> fs.hdfs.hadoopconf configuration option.
>>
>> With these two things in place, you should be good to go.
>>
>> [1]
>> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
>> [2]
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>>
>> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
>> kudryavtsev.konstantin@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
>>> data from S3. I tried the following path for data
>>> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
>>> exception:
>>>
>>> java.io.IOException: Cannot establish connection to Amazon S3:
>>> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
>>> we calculated does not match the signature you provided. Check your key
>>> and signing method. (Service: Amazon S3; Status Code: 403;
>>>
>>> I added access and secret keys, so the problem is not here. I=92m using
>>> standard region and gave read credential to everyone.
>>>
>>> Any ideas how can it be fixed?
>>>
>>> Thank you in advance,
>>> Kostia
>>>
>>
>>
>

Mime
View raw message