From commits-return-18245-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Wed Aug 15 00:45:02 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id BF98A18067A for ; Wed, 15 Aug 2018 00:45:01 +0200 (CEST) Received: (qmail 55856 invoked by uid 500); 14 Aug 2018 22:45:00 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 55847 invoked by uid 99); 14 Aug 2018 22:45:00 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Aug 2018 22:45:00 +0000 From: GitBox To: commits@airflow.apache.org Subject: [GitHub] fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch Message-ID: <153428670036.31277.6976068719098411896.gitbox@gitbox.apache.org> Date: Tue, 14 Aug 2018 22:45:00 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210127353 ########## File path: airflow/contrib/hooks/gcp_dataflow_hook.py ########## @@ -124,36 +127,38 @@ def __init__(self, cmd): def _line(self, fd): if fd == self._proc.stderr.fileno(): - lines = self._proc.stderr.readlines() - for line in lines: - self.log.warning(line[:-1]) - if lines: - return lines[-1] + return self._proc.stderr.readline() if fd == self._proc.stdout.fileno(): - line = self._proc.stdout.readline() - return line + return self._proc.stdout.readline() @staticmethod def _extract_job(line): - if line is not None: - if line.startswith("Submitted job: "): - return line[15:-1] + job_id_pattern = re.compile( + '.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*') + matched_job = job_id_pattern.match(line or '') + if matched_job: + return matched_job.group(1) def wait_for_done(self): reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()] self.log.info("Start waiting for DataFlow process to complete.") - while self._proc.poll() is None: + job_id = None + while True: ret = select.select(reads, [], [], 5) if ret is not None: for fd in ret[0]: line = self._line(fd) if line: - self.log.debug(line[:-1]) + self.log.info(line[:-1]) Review comment: Good point, done. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services