drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4647) C++ client is not propagating a connection failed error when a drillbit goes down
Date Wed, 06 Jul 2016 02:07:10 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363616#comment-15363616
] 

ASF GitHub Bot commented on DRILL-4647:
---------------------------------------

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/493#discussion_r69666695
  
    --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
    @@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType
t
     
         uint64_t coordId;
         DrillClientQueryResult* pQuery=NULL;
    +    connectionStatus_t cStatus=CONN_SUCCESS;
         {
             boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
             boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
             coordId = this->getNextCoordinationId();
             OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId,
&query);
    -        sendSync(out_msg);
     
    +        // Create the result object and register the listener before we send the query
    +        // because sometimes the caller is not checking the status of the submitQuery
call.
    +        // This way, the broadcast error call will cause the results listener to be called
    +        // with a COMM_ERROR status.
             pQuery = new DrillClientQueryResult(this, coordId, plan);
             pQuery->registerListener(l, lCtx);
    -        bool sendRequest=false;
             this->m_queryIds[coordId]=pQuery;
     
    -        DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG)  << "Sent query request. " << "["
<< m_connectedHost << "]"  << "Coordination id = " << coordId <<
std::endl;)
    -        DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG)  << "Sent query " <<  "Coordination
id = " << coordId << " query: " << plan << std::endl;)
    +        connectionStatus_t cStatus=sendSync(out_msg);
    +        if(cStatus == CONN_SUCCESS){
    +            bool sendRequest=false;
     
    -        if(m_pendingRequests++==0){
    -            sendRequest=true;
    -        }else{
    -            DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server"
<< std::endl;)
    -            DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests =
" << m_pendingRequests << std::endl;)
    -        }
    -        if(sendRequest){
    -            DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number
of pending requests = "
    -                << m_pendingRequests << std::endl;)
    -            getNextResult(); // async wait for results
    +            DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG)  << "Sent query request. " <<
"[" << m_connectedHost << "]"  << "Coordination id = " << coordId
<< std::endl;)
    +                DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG)  << "Sent query " <<  "Coordination
id = " << coordId << " query: " << plan << std::endl;)
    +
    +                if(m_pendingRequests++==0){
    +                    sendRequest=true;
    +                }else{
    +                    DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request
to server" << std::endl;)
    +                        DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending
requests = " << m_pendingRequests << std::endl;)
    +                }
    +            if(sendRequest){
    +                DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number
of pending requests = "
    +                        << m_pendingRequests << std::endl;)
    +                    getNextResult(); // async wait for results
    +            }
             }
    +
    +    }
    +    if(cStatus!=CONN_SUCCESS){
    --- End diff --
    
    Though you are correct in once respect. The only reliable way to know if a connection
is gone is the heartbeat. So a heartbeat failure will eventually occur and will guarantee
that any pending queries fail in case of a network failure.


> C++ client is not propagating a connection failed error when a drillbit goes down
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4647
>                 URL: https://issues.apache.org/jira/browse/DRILL-4647
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Parth Chandra
>
> When a drillbit goes down, there are two conditions under which the client is not propagating
the error back to the application -
> 1) The application is in a submitQuery call: the ODBC driver is expecting that the error
be reported thru the query results listener which hasn't been registered at the point the
error is encountered.
> 2) A submitQuery call succeeded but never reached the drillbit because it was shutdown.
In this case the application has a handle to a query and is listening for results which will
never arrive. The heartbeat mechanism detects the failure, but is not propagating the error
to the query results listener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message