From dev-return-8943-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Tue Jul 23 07:44:00 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 17F6D1802C7 for ; Tue, 23 Jul 2019 09:43:59 +0200 (CEST) Received: (qmail 61422 invoked by uid 500); 23 Jul 2019 07:43:58 -0000 Mailing-List: contact dev-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.apache.org Delivered-To: mailing list dev@airflow.apache.org Received: (qmail 61399 invoked by uid 99); 23 Jul 2019 07:43:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jul 2019 07:43:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 763ED1812D9 for ; Tue, 23 Jul 2019 07:43:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.811 X-Spam-Level: * X-Spam-Status: No, score=1.811 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=polidea.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id hQAltWb86fWs for ; Tue, 23 Jul 2019 07:43:54 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::d36; helo=mail-io1-xd36.google.com; envelope-from=jarek.potiuk@polidea.com; receiver= Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id D8D207E22D for ; Tue, 23 Jul 2019 07:43:53 +0000 (UTC) Received: by mail-io1-xd36.google.com with SMTP id s7so79711969iob.11 for ; Tue, 23 Jul 2019 00:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=polidea.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=iZbDQVGxVu4T+PxrOVx4i3/v36gu6HDN6XQuxKklPFE=; b=HZRxyjCIcEZKXfSf5rxzVDQ/AZPPpqYT3ZCfiNn2Fnk8oz0x567xueX7MlRdldKuDD TZi8oFmj/IiKn0nsa8dqc0zMAOWw93W4++KskKH6IKZQYHdDZap4JKGsJmC6kfGBW1Ew 562UvFqA5YxAacRPO7iR/Sy7sGBdZjIA0a02A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=iZbDQVGxVu4T+PxrOVx4i3/v36gu6HDN6XQuxKklPFE=; b=oz0nJAH15Wpl0ynE1b55GgidxGBvpxhCo5gr8YaOETmRmgF9nAh4Hgh9xznD9FcC28 dtbI13ls7RAqJpcveIZtFuIRHSanJU1TwLjUbhMmRziP5W/eBChh2pKX5ZDz6B+1njuh RpZ8N0IiGr8+AJqjtBuCABeN5vDUA/BqDUK/EUt6By38RYQek3fAmQlx4t0B39KarW5V wXxkyHzh75WJfoHZ4M1cIBG8bSpP1ltpw6e8LEUAmlZ++NrsBwc3gIQU/XqS9LNJ2JuK RyKiqYO0QzDjroak6XqJr2aG5F1c6impehMxI72IhGK9LRE9cSXSp1f+kC8N4eUR+P9u BLdQ== X-Gm-Message-State: APjAAAXHtKx8/+UPD1LnBSQ+jL9gcqkmwg0kqaetl8jBygM1adpGFydZ nNhG7bdVCpN8h7334HbsGIW1v8rZC0iFTQ5EcaBIHfJO X-Google-Smtp-Source: APXvYqzkvoMPMr7Uy7iPpg/ZUGuqaWRpGFM/Etq1zHS5Ws/t8AS8B6L+R0sdxBxjZempePt/VXyw33el8YOcyglwE8Q= X-Received: by 2002:a02:c646:: with SMTP id k6mr4933066jan.134.1563867826058; Tue, 23 Jul 2019 00:43:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jarek Potiuk Date: Tue, 23 Jul 2019 09:43:18 +0200 Message-ID: Subject: Re: Travis CI random failures To: dev@airflow.apache.org Content-Type: multipart/alternative; boundary="0000000000003f8ed4058e545a14" --0000000000003f8ed4058e545a14 Content-Type: text/plain; charset="UTF-8" It's now pretty consistent and happens pretty much every time using the old build system - for example here: https://travis-ci.org/apache/airflow/builds/562435992. I will cancel all PRs and disable automated PR build on Travis until we solve the problem - as it is pointless - new PRs will simply queue and fail constantly. I opened critical infrastructure ticket: https://issues.apache.org/jira/browse/INFRA-18787 and I am running some additional tests - I run the builds from commit before the new CI so that I see if another change since then could cause it. J. On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk wrote: > Update2: I can confirm that the same memory/resource related issues happen > in my Travis CI forks with reverted changes :( > https://travis-ci.org/potiuk/airflow/builds/562430507 . I will escalate > it to Travis/APACHE infrastructure > > On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk > wrote: > >> Update: it looks like it's Travis's problem: I reverted the CI changes >> and we have the same CPU problem in the old build: >> https://travis-ci.org/potiuk/airflow/jobs/562430517 . >> >> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk >> wrote: >> >>> Hello everyone, >>> >>> We've started to experience some random failures on Travis relaated to >>> lack of resources: those are either Out of Memory errors or lack of CPUS to >>> run Kubernetes builds. >>> >>> I tried to rerun those, thinking it was an intermittent error. It >>> started happening yesterday and I have not seen it before so I rather doubt >>> it is related to the latest changes. >>> >>> But I do not want to risk everyone being blocked so I am testing now on >>> my own fork if reverting the latest CI changes help. I will let you know >>> and will revert in case I found old CI works in a stable way. >>> >>> In the meantime - I will cancel all outstanding builds that are >>> blocking our queue and will test it both old CI and new CI in our fork :( >>> (Travis queue limit is not helping). >>> >>> Can you please hold on with rebasing/pushing new PRs until I check it. >>> >>> Example failures: >>> >>> >>> - OSError: [Errno 12] Cannot allocate memory ( >>> https://travis-ci.org/apache/airflow/jobs/562395978) >>> - [ERROR NumCPU]: the number of available CPUs 1 is less than the >>> required 2 (https://travis-ci.org/apache/airflow/jobs/562395978) >>> >>> >>> J. >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] >> >> > > -- > > Jarek Potiuk > Polidea | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] > > -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] --0000000000003f8ed4058e545a14--