drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "abdelhakim deneche" <adene...@gmail.com>
Subject Re: Review Request 36168: DRILL-3167: When a query fails, Foreman should wait for all fragments to finish cleaning up before sending a FAILED state to the client
Date Mon, 06 Jul 2015 19:52:28 GMT


> On July 6, 2015, 6:35 p.m., Sudheesh Katkam wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java, line
841
> > <https://reviews.apache.org/r/36168/diff/1/?file=998962#file998962line841>
> >
> >     If there is a large number of fragments being sent and the last one failed in
setupNonRootFragments, we do not _wait for all fragments to finish cleaning up_, correct?
It is a limitation then?

true and that is exactly the scenario described in the else-block comment.

Unfortunately I couldn't find a straightforward way to fix that case (that was the reason
I discarded the previous review request in the first place).

One thing to keep in mind though, if we try to wait for all running fragments to finish (like
we do for cancelled queries) there is a risk for the Foreman to wait forever if some of those
fragments become unresponsive, or in this case the fragment didn't even start running on the
remote fragment. That's exactly what I tried to avoid by using the "nonRootFragmentsSent"
flag


- abdelhakim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36168/#review90514
-----------------------------------------------------------


On July 3, 2015, 4:13 p.m., abdelhakim deneche wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36168/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 4:13 p.m.)
> 
> 
> Review request for drill, Jacques Nadeau and Sudheesh Katkam.
> 
> 
> Bugs: DRILL-3167
>     https://issues.apache.org/jira/browse/DRILL-3167
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> when a query fails Foreman moves to a FAILING state until all fragment return a terminal
state. The Web UI will still display FAILED instead of FAILING
> 
> TestDrillbitResilience#failsWhenSendingFragments exposes a limitation to this approach:
if an error occurs when setting up remote fragments we can't assume they will be able to return
a terminal state. In this case the Foreman will not wait for them to finish and return FAILED
to the client immediately.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
6656bf6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
dd26a76 
>   exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java 716fb66

>   exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java 9318233

>   exec/java-exec/src/main/resources/rest/profile/list.ftl cf92ede 
>   exec/java-exec/src/main/resources/rest/profile/profile.ftl 46cdc83 
>   exec/java-exec/src/test/java/org/apache/drill/exec/server/TestDrillbitResilience.java
ce09f68 
>   protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java e76d748 
>   protocol/src/main/java/org/apache/drill/exec/proto/beans/QueryResult.java 474e330 
>   protocol/src/main/protobuf/UserBitShared.proto 0451fd2 
> 
> Diff: https://reviews.apache.org/r/36168/diff/
> 
> 
> Testing
> -------
> 
> unit tests are passing along with customer and tpch100
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message