drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
Date Thu, 09 Mar 2017 22:24:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903974#comment-15903974
] 

ASF GitHub Bot commented on DRILL-5316:
---------------------------------------

Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/772#discussion_r105286121
  
    --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
    @@ -2143,6 +2146,9 @@ connectionStatus_t PooledDrillClientImpl::connect(const char* connStr,
DrillUser
                 Utils::shuffle(drillbits);
                 // The original shuffled order is maintained if we shuffle first and then
add any missing elements
                 Utils::add(m_drillbits, drillbits);
    +            if (m_drillbits.empty()){
    +                return handleConnError(CONN_FAILURE, getMessage(ERR_CONN_ZKNODBIT));
    --- End diff --
    
    Since we are not removing the offline nodes from m_drillbits then I think we should return
connection error before shuffle. Let's say on first client connection we get all the active
node from zookeeper and store it in m_drillbits. Then all the nodes went dead or offline.
In the next connection request, zookeeper will return zero drillbits but since m_drillbits
is not empty we will still try to connect and fail later. 
    
    Instead I think zero drillbits returned from zookeeper is a good indication that we won't
be able to connect to any other node already present inside m_drillbits and should fail there
itself ?


> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with
ZOK
> --------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5316
>                 URL: https://issues.apache.org/jira/browse/DRILL-5316
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - C++
>            Reporter: Rob Wu
>            Priority: Critical
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would crash without
any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, &drillbitsVector); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message