flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clifford Resnick <cresn...@mediamath.com>
Subject Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?
Date Sat, 16 Jul 2016 16:26:58 GMT
Using Flink 1.1-SNAPSHOT, Hadoop-aws 2.6.4

The error I’m getting is :

11:05:44,425 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask           - Caught
exception while materializing asynchronous checkpoints.
com.amazonaws.AmazonClientException: Unable to calculate MD5 hash: /var/folders/t8/k5764ltj4sq4ft06c1zp0nxn928mwr/T/flink-io-247956be-e422-4222-a512-e3ae321b1590/ede87211c622f86d1ef7b2b323076e79/WindowOperator_10_3/dummy_state/31b7ca7b-dc94-4d40-84c7-4f10ebc644a2/local-chk-1
(Is a directory)
                at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1266)
                at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131)
                at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
                at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
                at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)

In the debugger I noticed that some of the uploaded checkpoints are from the configured /tmp
location. These succeed as file in the request is fully qualified, but I guess it’s different
for WindowOperators? Here the file in the request (using a different /var/folders.. location
not configured by me – must be a mac thing?) is actually a directory. The AWS api is failing
when it tries to calculate an MD5 of the directory. The Flink side of the codepath is hard
to discern from debugging because it’s asynchronous.

I get the same issue whether local or on a CentOs- based YARN cluster. Everything works if
I use HDFS instead. Any insight will be greatly appreciated! When I get a chance later I may
try S3n or perhaps S3a with MD5 verification skipped.


View raw message