drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Westin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2550) Drillbit disconnect from ZK results in drillbit being lost until restart
Date Wed, 01 Apr 2015 15:55:53 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390862#comment-14390862

Chris Westin commented on DRILL-2550:

It sounds like Drillbits don't detect their own connection to ZK being broken. We should add
that, and when it happens, we have to periodically poll to see if we can reconnect again to
rejoin the cluster.

> Drillbit disconnect from ZK results in drillbit being lost until restart
> ------------------------------------------------------------------------
>                 Key: DRILL-2550
>                 URL: https://issues.apache.org/jira/browse/DRILL-2550
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.8.0
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Chris Westin
>            Priority: Minor
>             Fix For: 0.9.0
> Not quite sure if this is an issue or even if its important- maybe someone can think
of a situation where this might be a bigger issue.
> Steps taken to recreate:
> 1. Startup drillbits on multiple nodes. (They all come up and form a 8 node cluster)
> 2. Start executing a long running query.
> 3. Use TCPKILL to kill all connections between one node and zookeeper port 5181. 
> Drill seems to behave very gracefully here - I see a nice error message saying Query
failed: ForemanException: One more more nodes lost connectivity during query. Identified node
was atsqa6c61.qa.lab
> However, once I start allowing connections back the node is not brought back as part
of the cluster until a drillbit restart.

This message was sent by Atlassian JIRA

View raw message