hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Birger Brunswiek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query
Date Fri, 23 Jun 2017 12:41:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Birger Brunswiek updated HIVE-16949:
------------------------------------
    Description: 
The commit [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
which was part of HIVE-15546 [introduced a thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
which is not shutdown upon completion of its threads. This leads to a leak of threads for
each query which uses more than 1 partition. They are not removed by the GC. When queries
spanning multiple partitions are made the number of threads increases and is never reduced.
On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see [documentation section
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
This is not currently the case for the Get-Input-Paths thread pool. I would add a _pool.shutdown()_
in a finally block just before returning the result to make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents
the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
which was part of HIVE-15546 [introduced a thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
which is not shutdown upon completion of its threads. This leads to a leak of threads for
each query which uses more than 1 partition. They are not removed by the GC. When queries
spanning multiple partitions are made the number of threads increases and is never reduced.
On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see [documentation section
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
I am not sure why this is not the case. I would add a _pool.shutdown()_ just [after the pool
has completed its work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
to make sure the threads are really shutdown. This, however, would only fix normal operation.
There are other exit points, namely through exceptions, which would still lead to the same
leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents
the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in query
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-16949
>                 URL: https://issues.apache.org/jira/browse/HIVE-16949
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Birger Brunswiek
>
> The commit [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
which was part of HIVE-15546 [introduced a thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
which is not shutdown upon completion of its threads. This leads to a leak of threads for
each query which uses more than 1 partition. They are not removed by the GC. When queries
spanning multiple partitions are made the number of threads increases and is never reduced.
On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see [documentation
section _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
This is not currently the case for the Get-Input-Paths thread pool. I would add a _pool.shutdown()_
in a finally block just before returning the result to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents
the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message