Mailing-List: contact dev-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAJrw0OREcYB65zcMG8kHP_VmOO+LnsFzET8K6M5PCTpPj3u4jA@mail.gmail.com>
References: 
 <CAKTYAC80EvjHJnoJ3n398VvMcFrP3Cq5dr8PSx7JbF_ebmwYJw@mail.gmail.com>
	<CAJrw0ORff5HYRxaSypivoQSN0ogzAgv1mszvvoxOk=J61hh1SA@mail.gmail.com>
	<CAKTYAC-hkPCmNFTaipoJh2f1hB=0DFRZe-hj0tRE=7_s1LQ8DQ@mail.gmail.com>
	<CAJrw0OREcYB65zcMG8kHP_VmOO+LnsFzET8K6M5PCTpPj3u4jA@mail.gmail.com>
Date: Fri, 31 Jul 2015 21:19:37 -0700
Message-ID: 
 <CAKTYAC8OOTd6qhbX5sYg1SEzy8JinT=vawhKcOVu0PEcr3uUow@mail.gmail.com>
Subject: Re: Suspicious direct memory consumption when running queries
 concurrently
From: Abdel Hakim Deneche <adeneche@maprtech.com>
To: "dev@drill.apache.org" <dev@drill.apache.org>
Content-Type: multipart/alternative; boundary=001a114d80faa00831051c383eae

--001a114d80faa00831051c383eae
Content-Type: text/plain; charset=UTF-8

I tried getting a jmap dump multiple times without success, each time it
crashes the jvm with the following exception:

Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> ...
> Exception in thread "main" java.io.IOException: Premature EOF
>         at
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
>         at
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
>         at
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
>         at
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
>         at sun.tools.jmap.JMap.dump(JMap.java:242)
>         at sun.tools.jmap.JMap.main(JMap.java:140)


On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau <jacques@dremio.com> wrote:

> A allocate -> release cycle all on the same thread goes into a per thread
> cache.
>
> A bunch of Netty arena settings are configurable.  The big issue I believe
> is that the limits are soft limits implemented by the allocation-time
> release mechanism.  As such, if you allocate a bunch of memory, then
> release it all, that won't necessarily trigger any actual chunk releases.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > @Jacques, my understanding is that chunks are not owned by specific a
> > thread but they are part of a specific memory arena which is in turn only
> > accessed by specific threads. Do you want me to find which threads are
> > associated with the same arena where we have hanging chunks ?
> >
> >
> > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <jacques@dremio.com>
> > wrote:
> >
> > > It sounds like your statement is that we're cacheing too many unused
> > > chunks.  Hanifi and I previously discussed implementing a separate
> > flushing
> > > mechanism to release unallocated chunks that are hanging around.  The
> > main
> > > question is, why are so many chunks hanging around and what threads are
> > > they associated with.  A Jmap dump and analysis should allow you to do
> > > determine which thread owns the excess chunks.  My guess would be the
> RPC
> > > pool since those are long lasting (as opposed to the WorkManager pool,
> > > which is contracting).
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com>
> > > wrote:
> > >
> > > > When running a set of, mostly window function, queries concurrently
> on
> > a
> > > > single drillbit with a 8GB max direct memory. We are seeing a
> > continuous
> > > > increase of direct memory allocation.
> > > >
> > > > We repeat the following steps multiple times:
> > > > - we launch in "iteration" of tests that will run all queries in a
> > random
> > > > order, 10 queries at a time
> > > > - after the iteration finishes, we wait for a couple of minute to
> give
> > > > Drill time to release the memory being held by the finishing
> fragments
> > > >
> > > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > > snapshots of how memory was internally used by Netty, we only focused
> > on
> > > > the number of allocated chunks, if we take this number and multiply
> it
> > by
> > > > 16MB (netty's chunk size) we get approximately the same value
> reported
> > by
> > > > Drill's direct memory allocation.
> > > > Here is a graph that shows the evolution of the number of allocated
> > > chunks
> > > > on a 500 iterations run (I'm working on improving the plots) :
> > > >
> > > > http://bit.ly/1JL6Kp3
> > > >
> > > > In this specific case, after the first iteration Drill was allocating
> > > ~2GB
> > > > of direct memory, this number kept rising after each iteration to
> ~6GB.
> > > We
> > > > suspect this caused one of our previous runs to crash the JVM.
> > > >
> > > > If we only focus on the log lines between iterations (when Drill's
> > memory
> > > > usage is below 10MB) then all allocated chunks are at most 2% usage.
> At
> > > > some point we end up with 288 nearly empty chunks, yet the next
> > iteration
> > > > will cause more chunks to be allocated!!!
> > > >
> > > > is this expected ?
> > > >
> > > > PS: I am running more tests and will update this thread with more
> > > > informations.
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>


-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

--001a114d80faa00831051c383eae--