flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5898) Race-Condition with Amazon Kinesis KPL
Date Mon, 22 May 2017 05:04:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019140#comment-16019140

Tzu-Li (Gordon) Tai commented on FLINK-5898:

Let me think a bit about how to proceed with this.. Will keep you updated.

> Race-Condition with Amazon Kinesis KPL
> --------------------------------------
>                 Key: FLINK-5898
>                 URL: https://issues.apache.org/jira/browse/FLINK-5898
>             Project: Flink
>          Issue Type: Bug
>          Components: Kinesis Connector
>    Affects Versions: 1.2.0
>            Reporter: Scott Kidder
> The Flink Kinesis streaming-connector uses the Amazon Kinesis Producer Library (KPL)
to send messages to Kinesis streams. The KPL relies on a native binary client to send messages
to achieve better performance.
> When a Kinesis Producer is instantiated, the KPL will extract the native binary to a
sub-directory of `/tmp` (or whatever the platform-specific temporary directory happens to
> The KPL tries to prevent multiple processes from extracting the binary at the same time
by wrapping the operation in a mutex. Unfortunately, this does not prevent multiple Flink
cores from trying to perform this operation at the same time. If two or more processes attempt
to do this at the same time, then the native binary in /tmp will be corrupted.
> The authors of the KPL are aware of this possibility and suggest that users of the KPL
.... not do that ... (sigh):
> https://github.com/awslabs/amazon-kinesis-producer/issues/55#issuecomment-251408897
> I encountered this in my production environment when bringing up a new Flink task-manager
with multiple cores and restoring from an earlier savepoint, resulting in the instantiation
of a KPL client on each core at roughly the same time.
> A stack-trace follows:
> {noformat}
> java.lang.RuntimeException: Could not copy native binaries to temp directory /tmp/amazon-kinesis-producer-native-binaries
> 	at com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:849)
> 	at com.amazonaws.services.kinesis.producer.KinesisProducer.<init>(KinesisProducer.java:243)
> 	at org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer.open(FlinkKinesisProducer.java:198)
> 	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:112)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.SecurityException: The contents of the binary /tmp/amazon-kinesis-producer-native-binaries/kinesis_producer_e9a87c761db92a73eb74519a4468ee71def87eb2
is not what it's expected to be.
> 	at com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:822)
> 	... 8 more
> {noformat}

This message was sent by Atlassian JIRA

View raw message