Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 48E7A200D24 for ; Tue, 24 Oct 2017 23:55:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4775B1609C8; Tue, 24 Oct 2017 21:55:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 671C5160BDB for ; Tue, 24 Oct 2017 23:55:39 +0200 (CEST) Received: (qmail 76863 invoked by uid 500); 24 Oct 2017 21:55:38 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 76821 invoked by uid 99); 24 Oct 2017 21:55:38 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2017 21:55:38 +0000 Received: from mail-ua0-f181.google.com (mail-ua0-f181.google.com [209.85.217.181]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 380A01A02D6; Tue, 24 Oct 2017 21:55:36 +0000 (UTC) Received: by mail-ua0-f181.google.com with SMTP id s41so16421870uab.10; Tue, 24 Oct 2017 14:55:35 -0700 (PDT) X-Gm-Message-State: AMCzsaV6s8iLnJGK4qVD9WTlmgvaIEnHlmyE1zTvoJ9rH5n1LJghZaCC YbdIYfGUueIW7LWng86X/Ku7OQHFBES6YdbdpYE= X-Google-Smtp-Source: ABhQp+SbARGENsUMSXExeWypxao1edqpGNOIrq0VkqqHrz2mYC6XNiguyCVK0lfPz0uoUo0TRZNRfCKqu0VshUCJcPA= X-Received: by 10.176.21.174 with SMTP id i43mr84450uae.142.1508882133754; Tue, 24 Oct 2017 14:55:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.35.79 with HTTP; Tue, 24 Oct 2017 14:55:13 -0700 (PDT) In-Reply-To: <1508881860869.26243@hortonworks.com> References: <2015051302.899.1508766606125.JavaMail.jenkins@jenkins-master.apache.org> <1192936727.1006.1508780163877.JavaMail.jenkins@jenkins-master.apache.org> <30ABF5F3-AC5A-4B1C-B189-C2B9649594DE@effectivemachines.com> <4308E563-F9FC-48CD-A7A4-ACC4E95A0E20@effectivemachines.com> <1508879237694.92946@hortonworks.com> <1508881860869.26243@hortonworks.com> From: Chris Douglas Date: Tue, 24 Oct 2017 14:55:13 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 To: Junping Du Cc: Sean Busbey , Allen Wittenauer , Hadoop Common , Hdfs-dev , "mapreduce-dev@hadoop.apache.org" , "yarn-dev@hadoop.apache.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable archived-at: Tue, 24 Oct 2017 21:55:40 -0000 Sean/Junping- Ignoring the epistemology, it's a problem. Let's figure out what's causing memory to balloon and then we can work out the appropriate remedy. Is this reproducible outside the CI environment? To Junping's point, would YETUS-561 provide more detailed information to aid debugging? -C On Tue, Oct 24, 2017 at 2:50 PM, Junping Du wrote: > In general, the "solid evidence" of memory leak comes from analysis of he= apdump, jastack, gc log, etc. In many cases, we can locate/conclude which p= iece of code are leaking memory from the analysis. > > Unfortunately, I cannot find any conclusion from previous comments and it= even cannot tell which daemons/components of HDFS consumes unexpected high= memory. Don't sounds like a solid bug report to me. > > > > Thanks,? > > > Junping > > > ________________________________ > From: Sean Busbey > Sent: Tuesday, October 24, 2017 2:20 PM > To: Junping Du > Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-dev@hadoop.apach= e.org; yarn-dev@hadoop.apache.org > Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > > Just curious, Junping what would "solid evidence" look like? Is the suppo= sition here that the memory leak is within HDFS test code rather than libra= ry runtime code? How would such a distinction be shown? > > On Tue, Oct 24, 2017 at 4:06 PM, Junping Du > wrote: > Allen, > Do we have any solid evidence to show the HDFS unit tests going thro= ugh the roof are due to serious memory leak by HDFS? Normally, I don't expe= ct memory leak are identified in our UTs - mostly, it (test jvm gone) is ju= st because of test or deployment issues. > Unless there is concrete evidence, my concern on seriously memory le= ak for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, = etc.) have deployed 2.8 on large production environment for months. Non-ser= ious memory leak (like forgetting to close stream in non-critical path, etc= .) and other non-critical bugs always happens here and there that we have t= o live with. > > Thanks, > > Junping > > ________________________________________ > From: Allen Wittenauer > > Sent: Tuesday, October 24, 2017 8:27 AM > To: Hadoop Common > Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org > Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer > wrote: >> >> >> >> With no other information or access to go on, my current hunch is that o= ne of the HDFS unit tests is ballooning in memory size. The easiest way to= kill a Linux machine is to eat all of the RAM, thanks to overcommit and th= at's what this "feels" like. >> >> Someone should verify if 2.8.2 has the same issues before a release goes= out ... > > > FWIW, I ran 2.8.2 last night and it has the same problems. > > Also: the node didn't die! Looking through the workspace (so the= next run will destroy them), two sets of logs stand out: > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/p= atch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > and > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourc= edir/hadoop-hdfs-project/hadoop-hdfs/ > > It looks like my hunch is correct: RAM in the HDFS unit tests ar= e going through the roof. It's also interesting how MANY log files there a= re. Is surefire not picking up that jobs are dying? Maybe not if memory i= s getting tight. > > Anyway, at the point, branch-2.8 and higher are probably fubar'd.= Additionally, I've filed YETUS-561 so that Yetus-controlled Docker contain= ers can have their RAM limits set in order to prevent more nodes going cata= tonic. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org > For additional commands, e-mail: yarn-dev-help@hadoop.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org > For additional commands, e-mail: common-dev-help@hadoop.apache.org > > > > > -- > busbey --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-dev-help@hadoop.apache.org