hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Wen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-979) Resource Broker Should Reconnect Hadoop Yarn When Failed to Get Cluster Report
Date Thu, 04 Aug 2016 08:39:20 GMT
Lin Wen created HAWQ-979:
----------------------------

             Summary: Resource Broker Should Reconnect Hadoop Yarn When Failed to Get Cluster
Report
                 Key: HAWQ-979
                 URL: https://issues.apache.org/jira/browse/HAWQ-979
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Resource Manager
            Reporter: Lin Wen
            Assignee: Lei Chang


While HAWQ with yarn mode is running, sometimes the heartbeat thread of libyarn maybe fail(e.g.
YARN RM restarts) and quit, 

2016-08-03 18:45:27.913838 PDT,,,p34645,th-1290610400,,,,0,con4,,seg-10000,,,,,"WARNING","01000","YARN
mode resource broker failed to get YARN queue report of queue default. LibYarnClient::getQueueInfo,
Catch the Exception:LibYarnClient::libyarn AM heartbeat thread has stopped.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1840,

resource broker process should re-register HAWQ to YARN in this case, but actually not.

The reason is:
In function handleRM2RB_GetClusterReport(), when RB2YARN_getQueueReport() failed, function
sendRBGetClusterReportErrorData() is called, but sendRBGetClusterReportErrorData() returns
OK(should return RESBROK_ERROR_GRM)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message