Hi!

Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.

Concerning (2) Do you know which component in Flink uses the HTTP client?

Greetings,
Stephan


On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <maximilian.bode@tngtech.com> wrote:
Hi everyone,

Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.

I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?

Cheers,
Max

> Am 04.01.2016 um 12:52 schrieb Chiwan Park <chiwanpark@apache.org>:
>
> Hi All,
>
> I have some problems using Flink on Amazon EMR cluster.
>
> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>
> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>
> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>
> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>
> Regards,
> Chiwan Park
>
>