tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (Jira)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-4087) Shuffle: Check for thread's liveliness regularly to avoid infinite wait in merger & referee threads
Date Mon, 07 Oct 2019 00:28:00 GMT

     [ https://issues.apache.org/jira/browse/TEZ-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rajesh Balamohan updated TEZ-4087:
----------------------------------
    Description: 
In certain cases, Shuffle's cleanupIgnoreErrors() is not called. This leaves 4 threads (inmem,
diskmerger, Referee, ShuffleAndMergeRunner) run forever.

When these are run in long running processes (e.g LLAP in Hive), they reach the thread limits
over time.

Note: Root cause why cleanupIgnoreErrors() is not invoked is not yet known. I will share the
details when i get more details on this.

Creating this ticket to add additional safety knobs to ensure that thread leaks do not happen.

 

  was:
In certain cases, Shuffle's cleanupIgnoreErrors() is not called. This leaves 4 threads (inmem,
diskmerger, Referee, ShuffleAndMergeRunner) run forever.

When these are run in long running processes (e.g LLAP in Hive), they reach the thread limits
over time.

Note: Root cause why cleanupIgnoreErrors() is not invoked is not yet known. I will share the
details when i get more details on this. This ticket is created 
as a add-on safety so that thread leaks do not happen.

 


> Shuffle: Check for thread's liveliness regularly to avoid infinite wait in merger &
referee threads
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-4087
>                 URL: https://issues.apache.org/jira/browse/TEZ-4087
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Major
>
> In certain cases, Shuffle's cleanupIgnoreErrors() is not called. This leaves 4 threads
(inmem, diskmerger, Referee, ShuffleAndMergeRunner) run forever.
> When these are run in long running processes (e.g LLAP in Hive), they reach the thread
limits over time.
> Note: Root cause why cleanupIgnoreErrors() is not invoked is not yet known. I will share
the details when i get more details on this.
> Creating this ticket to add additional safety knobs to ensure that thread leaks do not
happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message