flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn
Date Wed, 07 Jul 2021 10:18:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376464#comment-17376464
] 

Till Rohrmann commented on FLINK-23194:
---------------------------------------

Thanks for suggesting this improvement [~zlzhang0122]. I don't fully understand how caching
of the {{ContainerLaunchContext}} will decrease the pressure on HDFS. Does this mean that
creating a {{ContainerLaunchContext}} will access HDFS?

Have you measured how much this improvement speeds things up? Maybe you could share some details
about your test setup.

> Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext
on yarn
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-23194
>                 URL: https://issues.apache.org/jira/browse/FLINK-23194
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: zlzhang0122
>            Priority: Major
>             Fix For: 1.14.0
>
>
> When starting the TaskExecutor in container on yarn, this will create ContainerLaunchContext
for n times(n represent the number of the TaskManager).
> When I examined the progress of this creation, I found that most of them were in common
and had nothing to do with the particular TaskManager except the launchCommand. We can create
ContainerLaunchContext once and reuse it. Only the launchCommand need to create separately
for every particular TaskManager.
> So I propose that we can cache and reuse the ContainerLaunchContext object to accelerate
this creation progress. 
> I think this can have some benefit like below:
>  # this can accelerate the creation of ContainerLaunchContext and also the start of
the TaskExecutor, especially under the situation of massive TaskManager.
>  # this can decrease the pressure of the HDFS, etc. 
>  # this can also avoid the suddenly failure of the HDFS or yarn, etc.
> We have implemented this on our production environment. So far there has no problem and
have a good benefit. Please let me know if there's any point that I haven't considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message