drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2993) SQLLine hangs when we cancel a query in the middle of displaying results
Date Mon, 11 May 2015 00:33:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537461#comment-14537461
] 

Daniel Barclay (Drill) commented on DRILL-2993:
-----------------------------------------------

> We should be consuming all messages already.

We were, except in the case when cancelation happens when throttling was on (sending was suspended):

If the JDBC driver layer's batch queue gets full enough that the client tells the server to
start throttling (stop sending messages), and then the query gets canceled by Statement.cancel(),
the JDBC layer does set up to both consume and discard the queued batch objects and consume
(release buffers in) and discard further received messages.  

However, the code that normally stops throttling (i.e., has the server resume sending messages)
when the batch queue size decreases enough (as batches are consumed off the queue) no longer
executes after cancelation (other code takes over dequeuing, to consume and discard the still-queued
batches), so (without a fix) the server never got told to resume sending messages, so they
didn't get consumed (sent) from the server's outgoing message queue.


> Why would we need to change throttling to consume messages? 

Because we left throttling on (sending suspended) in the cancelation case and therefore didn't
get further messages that the server was waiting to send.

> Throttling doesn't kick in unless we're not consuming messages. We should fix that instead.

Ah--maybe we're miscommunicating a bit re (RPC-level) messages vs. queued batches (or server-
vs. client-side consumption of messages (or initial vs. partial consumption))).

Yes, throttling kicks in when we're not _fully_ consuming message data--when the queue-reading
thread isn't consuming the queued batch objects faster than the message-receiving/queue-writing
thread is receiving messages (releasing some buffer (_partially_ consuming the message)) and
adding each (or the non-released part of it) to the batch queue (for long enough that the
queue gets big enough to trigger throttling).

Throttling normally kicks out when the queue-reading thread reads enough batches for the queue's
size to reduce enough.

However, upon cancelation, if throttling was currently active, it never kicked out, because
dequeuing of batches to just consume and discard them (vs. consuming them by building JDBC
data) is done by different code, which doesn't stop any active throttling.  Therefore, although
the post-cancelation code was otherwise ready to receive and minimally consume (release buffers
in) any not-yet-received messages, they were never sent from the server.

Forcing throttling to kick out upon cancelation releases those messages from the server.


> The server should only send messages as fast as the client can consume them so we don't
get memory growth.

Yes, it did and does (before and with fix) work that way. 





> SQLLine hangs when we cancel a query in the middle of displaying results
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2993
>                 URL: https://issues.apache.org/jira/browse/DRILL-2993
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Rahul Challapalli
>            Assignee: Daniel Barclay (Drill)
>         Attachments: DRILL-2993.1.patch.txt
>
>
> git.commit.id.abbrev=8c706e6
> The data set contains 1 million records. I cancelled the below query after displaying
around ~400,000 records. All subsequent queries fail
> {code}
> 0: jdbc:drill:schema=dfs_eea>select * from `mobile.json`;
> .....
> .....
> Cancel after displaying 400,000 records
> 0: jdbc:drill:schema=dfs_eea>use dfs.drillTestDir;
> ....
> This hangs indefinitely
> {code}
> Since the data is large, I did not attach it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message