From users-return-143902-archive-asf-public=cust-asf.ponee.io@maven.apache.org Wed Mar 13 23:08:51 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3020818064A for ; Thu, 14 Mar 2019 00:08:51 +0100 (CET) Received: (qmail 57785 invoked by uid 500); 13 Mar 2019 23:08:49 -0000 Mailing-List: contact users-help@maven.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Maven Users List" Reply-To: "Maven Users List" Delivered-To: mailing list users@maven.apache.org Received: (qmail 57762 invoked by uid 99); 13 Mar 2019 23:08:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2019 23:08:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 94FB5C2C70 for ; Wed, 13 Mar 2019 23:08:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.099 X-Spam-Level: *** X-Spam-Status: No, score=3.099 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_ADSP_ALL=1.1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=procentive-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 1FEQ3Na6VVcd for ; Wed, 13 Mar 2019 23:08:46 +0000 (UTC) Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8F02A61172 for ; Wed, 13 Mar 2019 22:58:10 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id z25so3093003ljk.8 for ; Wed, 13 Mar 2019 15:58:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=procentive-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=oOjlPK4dgtqAUHl8IOmXc6DrIaeUWpicXNwktRAWFVc=; b=YdDfF98200Cvvrzzw8SAtd2DCdZdv4p7TPFfmrjNkJMC+c0SMqickaLnWmcXwc3nZU YzSFTfpgqOz4A+w22aNdDxqpfE2pG/eI+M9BJQZZDfrUKMp9QJyDbLxqKZOxYeGO56KT U82/OPBh5FaNvPrRAGNDONO68H4oQAyZqaywRKGqgWEMwaoV0XIH5yByLcov4+dfyWOC blgi/nIR6XX5XfTkyC6lxByctzZq4m6Ww0cIn5fkwILjBtKBNTlRTCP0th7ThutRK/B1 +dZK8CJD2Q9loGU2xZnPADO9xE8iMVMfauq6EHvYUoSNDq/T01RF/qhCcJVZFG4hij/n 89LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=oOjlPK4dgtqAUHl8IOmXc6DrIaeUWpicXNwktRAWFVc=; b=BOP+7Lafkz/9uRXnvxellc7i29G3AJA5bSP+EZhrmJDqZuBslLFVaKAPiqi3u3kUcQ 5ZaslhtgFh+P+H3rkBKAje4gAPdEc6u2sYxyXxJ/pxJP14uzkveAEAzhR4LBk5HDjs6K ljKDHJnXgYYUxFx3x81ZDPoq3HDvBq/udiLLGQuZoh12WKB6MHLoGn3W9+kGh2qroQ2s bj26SxpT18XlCT0hURtfbTq/s6RygS09O2oCh+0CVFfG1SGJjG0ZBPDFqAcw0r8nv7jw gvKTFWueEecB6zzxabYj+wyNc7aalqomJDClpZQDGh/EtLYugB7k/kI3q8mrrS41z1Vo lrpw== X-Gm-Message-State: APjAAAVaY3kyjW1/i5pBB+3/w8ltxyS2PHi+Kqjaa8CJbf/+0n155na0 lnVse9ZwaRj0yjjSgacUGww+HZ/udO5jE/Flg7bHYrZiPpI= X-Google-Smtp-Source: APXvYqzESxU4DEsr5pzKDgzWsl+jNCK767hipnZwWe4yvVQSZNCgDaTebHXNFpujTvORWF/S3X9At69CYbol9r0WyXs= X-Received: by 2002:a2e:9105:: with SMTP id m5mr25345576ljg.100.1552517888769; Wed, 13 Mar 2019 15:58:08 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jason Young Date: Wed, 13 Mar 2019 17:57:57 -0500 Message-ID: Subject: Re: Failsafe: Killing self fork JVM. PING timeout elapsed. To: Maven Users List Content-Type: multipart/alternative; boundary="0000000000004492af058401bd99" --0000000000004492af058401bd99 Content-Type: text/plain; charset="UTF-8" I upgraded failsafe and surefire to 3.0.0-M3 as advised; we encountered the same exception. (Still using -Xmx5g, will switch to OpenJ9 soon in case that helps.) BTW I also asked on StackOverflow previously, for anyone interested: https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed On Tue, Feb 26, 2019 at 6:40 PM Jason Young wrote: > Thanks again for the information. > > We had increased the RAM to 3g some time ago to prevent OOMEs. More > recently, I increased the RAM again to 5g for extra headroom since we had > more headroom available; the problem hasn't happened since, but it hasn't > been very long. > > We use a more customized image based on Alpine 3.8.2. The JDK and Maven > are obtained via apk. > > I will try upgrading failsafe (and surefire while I'm at it) sooner, and > probably do some experimentation with JVMs another time (not pressing for > me ATM). > > On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana > wrote: > >> >> I'll try to enable some logging about GC pauses to see what's up >> >> Pls do not keep such setting after tuning the GC because this may sometime >> break the interprocess communication between Maven process and surefire >> process. >> It's worth to list GC information in a file and not in the console logs. >> This can be configured, I guess. >> >> >> Do you think the value is simply too low? >> >> GCing many objects may take some time and I remember we had a user who had >> this problem a year or two ago. >> We check every third NOOP (which is 3 x 10 sec) as a fix instead of every >> NOP. So 30 seconds looked satisfactory. >> I think you use old version 2.20 or something like that. The fixes for >> docker have been done so far, so please use the latest version 3.0.0-M3. >> See this page >> https://maven.apache.org/surefire/maven-surefire-plugin/docker.html, we >> used maven:3.5.3-jdk-8-alpine in this test. Which base image did you use? >> >> Cheers >> Tibor >> >> On Tue, Feb 26, 2019 at 5:24 PM Jason Young >> wrote: >> >> > Thanks for the information. It's good to see someone understands a >> little >> > about this. >> > >> > Incidentally, we have been looking at other GCs and VMs for the >> application >> > in production environments, so I'll look into how these affect tests as >> > well. I'll try to enable some logging about GC pauses to see what's up. >> > >> > How would `-Xmx3g` cause long GC cycles? Do you think the value is >> simply >> > too low? >> > >> > FWIW we're running the Maven build in an Alpine-based Docker container. >> > >> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana >> > wrote: >> > >> > > Hi Jason, >> > > >> > > We spoke about this issue on our chat in ASF Slack: >> > > "I think his tests have been paused for a long GC periods and timed >> out >> > 3x >> > > PING period = 30 seconds. After this period forked JVM supposed the >> Maven >> > > process was killed by JenkinsCI and therefore all surefire processes >> are >> > > killed as well and all the file handlers and memory consumptions are >> > > freed." >> > > >> > > "But I have to say that `-Xmx3g` may cause long GC cycles, see >> > > >> > > >> > >> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html >> > > " >> > > >> > > You are using java-1.8-openjdk. I guess you should use Shenandoah GC >> > which >> > > is an experimental algorithm in JVM 1.8. This would significantly >> short >> > > the GC cycles. >> > > >> > > We should of cource provide a new configuration parameter to give you >> a >> > > chance to prolong the PING. >> > > >> > > Cheers >> > > Tibor >> > > >> > >> > >> > -- >> > >> > Jason Young >> > >> > > --0000000000004492af058401bd99--