Return-Path: X-Original-To: apmail-hawq-dev-archive@minotaur.apache.org Delivered-To: apmail-hawq-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E3EB1763F for ; Thu, 29 Oct 2015 09:17:20 +0000 (UTC) Received: (qmail 40113 invoked by uid 500); 29 Oct 2015 09:17:19 -0000 Delivered-To: apmail-hawq-dev-archive@hawq.apache.org Received: (qmail 40062 invoked by uid 500); 29 Oct 2015 09:17:19 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 39909 invoked by uid 99); 29 Oct 2015 09:17:19 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2015 09:17:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DB4DC1A2A76 for ; Thu, 29 Oct 2015 09:17:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.971 X-Spam-Level: X-Spam-Status: No, score=0.971 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id D3r3dwJ8X3bd for ; Thu, 29 Oct 2015 09:17:10 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with SMTP id 61BA321342 for ; Thu, 29 Oct 2015 09:17:10 +0000 (UTC) Received: (qmail 39870 invoked by uid 99); 29 Oct 2015 09:17:10 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2015 09:17:10 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F2AAFE1091; Thu, 29 Oct 2015 09:17:09 +0000 (UTC) From: linwen To: dev@hawq.incubator.apache.org Reply-To: dev@hawq.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-hawq pull request: add check after submit Hawq AM to Yar... Content-Type: text/plain Message-Id: <20151029091709.F2AAFE1091@git1-us-west.apache.org> Date: Thu, 29 Oct 2015 09:17:09 +0000 (UTC) Github user linwen commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/58#discussion_r43363199 --- Diff: depends/libyarn/src/libyarnclient/LibYarnClient.cpp --- @@ -214,6 +214,60 @@ int LibYarnClient::createJob(string &jobName, string &queue,string &jobId) { } } +int LibYarnClient::forceKillJob(string &jobId) { + +#ifndef MOCKTEST + if ( keepRun ) { + keepRun=false; + void *thrc = NULL; + int rc = pthread_join(heartbeatThread, &thrc); + if ( rc != 0 ) { + LOG(INFO, "LibYarnClient::foreceKillJob, fail to join heart-beat thread. " + "error code %d", rc); + return FR_FAILED; + } + } +#endif + + try{ + if (jobId != clientJobId) { + throw std::invalid_argument("The jobId is wrong, please check the jobId argument"); + } + + for (map::iterator it = jobIdContainers.begin(); it != jobIdContainers. end(); it++) { + ostringstream key; + Container *container = it->second; + key << container->getNodeId().getHost() << ":" << container->getNodeId().getPort(); + Token nmToken = nmTokenCache[key.str()]; + ((ContainerManagement*)nmClient)->stopContainer((*container), nmToken); + LOG(INFO,"LibYarnClient::foreceKillJob, container:%d are stopped",container->getId().getId()); + } + + ((ApplicationClient*) appClient)->forceKillApplication(clientAppId); + LOG(INFO, "LibYarnClient::foreceKillJob, forceKillApplication"); + + for (map::iterator it = jobIdContainers.begin(); it != jobIdContainers.end(); it++) { + LOG(INFO,"LibYarnClient::foreceKillJob, container:%d in jobIdContainers are delete",it->second->getId().getId()); + delete it->second; + it->second = NULL; + } + jobIdContainers.clear(); + activeFailContainerIds.clear(); + return FR_SUCCEEDED; + } catch(std::exception& e){ + stringstream errorMsg; + errorMsg << "LibYarnClient::forceKillJob, Catch the Exception:" << e.what(); + setErrorMessage(errorMsg.str()); + return FR_FAILED; + } catch (...) { + stringstream errorMsg; + errorMsg << "LibYarnClient::forceKillJob, catch unexpected exception."; + setErrorMessage(errorMsg.str()); + return FR_FAILED; + } +} + --- End diff -- yes. in this case no container are returned to hawq. what I am thinking is, in some other cases in future, forceKillApplication() is called when hawq works normally, then containers should be returned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---