Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <1508881860869.26243@hortonworks.com>
References: <2015051302.899.1508766606125.JavaMail.jenkins@jenkins-master.apache.org>
 <1192936727.1006.1508780163877.JavaMail.jenkins@jenkins-master.apache.org>
 <30ABF5F3-AC5A-4B1C-B189-C2B9649594DE@effectivemachines.com>
 <CAE=b_fa=BjVwt8b66UGwQqArpr40SLAiEpV5ih7-RYeb8BbeXg@mail.gmail.com>
 <CAOScs9YYJiiOq9M0w7Rr+xo4gjpqNOgxH1kqCE-ERncPODj6wA@mail.gmail.com>
 <BB5174DF-EC0A-4563-9378-291AF39A9AF6@effectivemachines.com>
 <4308E563-F9FC-48CD-A7A4-ACC4E95A0E20@effectivemachines.com>
 <1508879237694.92946@hortonworks.com> <CAGHyZ6LZC7W_z+XAR1kPAR_2DHFtzWU0vT6ZQ9YJ0RPzzM_EXQ@mail.gmail.com>
 <1508881860869.26243@hortonworks.com>
From: Chris Douglas <cdouglas@apache.org>
Date: Tue, 24 Oct 2017 14:55:13 -0700
Message-ID: <CACO5Y4yQ=Yuv_c0ce1YxHxF5tDjai6a0v=cmVGGJu-L7LwXUgw@mail.gmail.com>
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
To: Junping Du <jdu@hortonworks.com>
Cc: Sean Busbey <busbey@cloudera.com>, Allen Wittenauer <aw@effectivemachines.com>,
	Hadoop Common <common-dev@hadoop.apache.org>, Hdfs-dev <hdfs-dev@hadoop.apache.org>,
	"mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org>,
	"yarn-dev@hadoop.apache.org" <yarn-dev@hadoop.apache.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
archived-at: Tue, 24 Oct 2017 21:55:40 -0000

Sean/Junping-

Ignoring the epistemology, it's a problem. Let's figure out what's
causing memory to balloon and then we can work out the appropriate
remedy.

Is this reproducible outside the CI environment? To Junping's point,
would YETUS-561 provide more detailed information to aid debugging? -C

On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <jdu@hortonworks.com> wrote:
> In general, the "solid evidence" of memory leak comes from analysis of he=
apdump, jastack, gc log, etc. In many cases, we can locate/conclude which p=
iece of code are leaking memory from the analysis.
>
> Unfortunately, I cannot find any conclusion from previous comments and it=
 even cannot tell which daemons/components of HDFS consumes unexpected high=
 memory. Don't sounds like a solid bug report to me.
>
>
>
> Thanks,?
>
>
> Junping
>
>
> ________________________________
> From: Sean Busbey <busbey@cloudera.com>
> Sent: Tuesday, October 24, 2017 2:20 PM
> To: Junping Du
> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-dev@hadoop.apach=
e.org; yarn-dev@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> Just curious, Junping what would "solid evidence" look like? Is the suppo=
sition here that the memory leak is within HDFS test code rather than libra=
ry runtime code? How would such a distinction be shown?
>
> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <jdu@hortonworks.com<mailto:j=
du@hortonworks.com>> wrote:
> Allen,
>      Do we have any solid evidence to show the HDFS unit tests going thro=
ugh the roof are due to serious memory leak by HDFS? Normally, I don't expe=
ct memory leak are identified in our UTs - mostly, it (test jvm gone) is ju=
st because of test or deployment issues.
>      Unless there is concrete evidence, my concern on seriously memory le=
ak for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, =
etc.) have deployed 2.8 on large production environment for months. Non-ser=
ious memory leak (like forgetting to close stream in non-critical path, etc=
.) and other non-critical bugs always happens here and there that we have t=
o live with.
>
> Thanks,
>
> Junping
>
> ________________________________________
> From: Allen Wittenauer <aw@effectivemachines.com<mailto:aw@effectivemachi=
nes.com>>
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop=
.apache.org>; yarn-dev@hadoop.apache.org<mailto:yarn-dev@hadoop.apache.org>
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <aw@effectivemachines.com=
<mailto:aw@effectivemachines.com>> wrote:
>>
>>
>>
>> With no other information or access to go on, my current hunch is that o=
ne of the HDFS unit tests is ballooning in memory size.  The easiest way to=
 kill a Linux machine is to eat all of the RAM, thanks to overcommit and th=
at's what this "feels" like.
>>
>> Someone should verify if 2.8.2 has the same issues before a release goes=
 out ...
>
>
>         FWIW, I ran 2.8.2 last night and it has the same problems.
>
>         Also: the node didn't die!  Looking through the workspace (so the=
 next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/p=
atch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>                                                         and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourc=
edir/hadoop-hdfs-project/hadoop-hdfs/
>
>         It looks like my hunch is correct:  RAM in the HDFS unit tests ar=
e going through the roof.  It's also interesting how MANY log files there a=
re.  Is surefire not picking up that jobs are dying?  Maybe not if memory i=
s getting tight.
>
>         Anyway, at the point, branch-2.8 and higher are probably fubar'd.=
 Additionally, I've filed YETUS-561 so that Yetus-controlled Docker contain=
ers can have their RAM limits set in order to prevent more nodes going cata=
tonic.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org<mailto:yar=
n-dev-unsubscribe@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org<mailto:y=
arn-dev-help@hadoop.apache.org>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org<mailto:c=
ommon-dev-unsubscribe@hadoop.apache.org>
> For additional commands, e-mail: common-dev-help@hadoop.apache.org<mailto=
:common-dev-help@hadoop.apache.org>
>
>
>
>
> --
> busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org