drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4595) FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query hangs
Date Wed, 02 Aug 2017 00:47:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kunal Khatua updated DRILL-4595:
--------------------------------
    Reviewer: Khurram Faraaz

[~khfaraaz] Can you verify if this issue is resolved with DRILL-5599 (Drill 1.11.0)?

> FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query
hangs
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4595
>                 URL: https://issues.apache.org/jira/browse/DRILL-4595
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: Future
>
>
> When a fragment fails it's assumed it will be able to close itself and send it's FAILED
state to the foreman which will cancel any running fragments. FragmentExecutor.cancel() will
interrupt the thread making sure those fragment don't stay blocked.
> However, if a fragment is already blocked when it's fail method is called the foreman
may never be notified about this and the query will hang forever. One such scenario is the
following:
> - generally it's a CTAS running on a large cluster (lot's of writers running in parallel)
> - logs show that the user channel was closed and UserServer caused the root fragment
to move to a FAILED state
> - jstack shows that the root fragment is blocked in it's receiver waiting for data
> - jstack also shows that ALL other fragments are no longer running, and the logs show
that all of them succeeded
> - the foreman waits *forever* for the root fragment to finish



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message