drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3845) UnorderedReceiver shouldn't terminate until it receives a final batch
Date Tue, 24 Nov 2015 22:22:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025562#comment-15025562
] 

Deneche A. Hakim commented on DRILL-3845:
-----------------------------------------

I changed the UnorderedReceiver to not kill it's providers until it receives the "last batch"
(you can see the change [here|https://github.com/adeneche/incubator-drill/commit/5dbd9fdc88b1c802dff3509dee85416efa3dac15]
but now, some queries will fail with the following error:
{noformat}
Error: SYSTEM ERROR: IllegalStateException: Cleanup before finished. 0 out of 1 strams have
finished
{noformat}

Fixing the receiver doesn't enforce the protocol. Senders will close their fragment as soon
as they receive a "kill signal", causing their receivers to close before they get the "final
batch", which throws the error above.

[~jnadeau] and [~sphillips]: is it valid to change the protocol such as receivers can terminate
before they get their "final batch" (which is already the case sometimes) and senders don't
send the "final batch" for receivers that already finished (they sent a "receiver finished"
message) ?


> UnorderedReceiver shouldn't terminate until it receives a final batch
> ---------------------------------------------------------------------
>
>                 Key: DRILL-3845
>                 URL: https://issues.apache.org/jira/browse/DRILL-3845
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 1.4.0
>
>         Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill
>
>
> Even if a receiver has finished and informed the corresponding partition sender, the
sender will still try to send a "last batch" to the receiver when it's done. In most cases
this is fine as those batches will be silently dropped by the receiving DataServer, but if
a receiver has finished +10 minutes ago, DataServer will throw an exception as it couldn't
find the corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished cache).
> DRILL-2274 is a reproduction for this case (after the corresponding fix is applied).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message