Mailing-List: contact dev-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAKTYAC-hkPCmNFTaipoJh2f1hB=0DFRZe-hj0tRE=7_s1LQ8DQ@mail.gmail.com>
References: 
 <CAKTYAC80EvjHJnoJ3n398VvMcFrP3Cq5dr8PSx7JbF_ebmwYJw@mail.gmail.com>
	<CAJrw0ORff5HYRxaSypivoQSN0ogzAgv1mszvvoxOk=J61hh1SA@mail.gmail.com>
	<CAKTYAC-hkPCmNFTaipoJh2f1hB=0DFRZe-hj0tRE=7_s1LQ8DQ@mail.gmail.com>
Date: Mon, 27 Jul 2015 15:45:32 -0700
Message-ID: 
 <CAJrw0OREcYB65zcMG8kHP_VmOO+LnsFzET8K6M5PCTpPj3u4jA@mail.gmail.com>
Subject: Re: Suspicious direct memory consumption when running queries
 concurrently
From: Jacques Nadeau <jacques@dremio.com>
To: dev@drill.apache.org
Content-Type: multipart/alternative; boundary=e89a8f642b06722b20051be31c48

--e89a8f642b06722b20051be31c48
Content-Type: text/plain; charset=UTF-8

A allocate -> release cycle all on the same thread goes into a per thread
cache.

A bunch of Netty arena settings are configurable.  The big issue I believe
is that the limits are soft limits implemented by the allocation-time
release mechanism.  As such, if you allocate a bunch of memory, then
release it all, that won't necessarily trigger any actual chunk releases.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> @Jacques, my understanding is that chunks are not owned by specific a
> thread but they are part of a specific memory arena which is in turn only
> accessed by specific threads. Do you want me to find which threads are
> associated with the same arena where we have hanging chunks ?
>
>
> On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <jacques@dremio.com>
> wrote:
>
> > It sounds like your statement is that we're cacheing too many unused
> > chunks.  Hanifi and I previously discussed implementing a separate
> flushing
> > mechanism to release unallocated chunks that are hanging around.  The
> main
> > question is, why are so many chunks hanging around and what threads are
> > they associated with.  A Jmap dump and analysis should allow you to do
> > determine which thread owns the excess chunks.  My guess would be the RPC
> > pool since those are long lasting (as opposed to the WorkManager pool,
> > which is contracting).
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > When running a set of, mostly window function, queries concurrently on
> a
> > > single drillbit with a 8GB max direct memory. We are seeing a
> continuous
> > > increase of direct memory allocation.
> > >
> > > We repeat the following steps multiple times:
> > > - we launch in "iteration" of tests that will run all queries in a
> random
> > > order, 10 queries at a time
> > > - after the iteration finishes, we wait for a couple of minute to give
> > > Drill time to release the memory being held by the finishing fragments
> > >
> > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > snapshots of how memory was internally used by Netty, we only focused
> on
> > > the number of allocated chunks, if we take this number and multiply it
> by
> > > 16MB (netty's chunk size) we get approximately the same value reported
> by
> > > Drill's direct memory allocation.
> > > Here is a graph that shows the evolution of the number of allocated
> > chunks
> > > on a 500 iterations run (I'm working on improving the plots) :
> > >
> > > http://bit.ly/1JL6Kp3
> > >
> > > In this specific case, after the first iteration Drill was allocating
> > ~2GB
> > > of direct memory, this number kept rising after each iteration to ~6GB.
> > We
> > > suspect this caused one of our previous runs to crash the JVM.
> > >
> > > If we only focus on the log lines between iterations (when Drill's
> memory
> > > usage is below 10MB) then all allocated chunks are at most 2% usage. At
> > > some point we end up with 288 nearly empty chunks, yet the next
> iteration
> > > will cause more chunks to be allocated!!!
> > >
> > > is this expected ?
> > >
> > > PS: I am running more tests and will update this thread with more
> > > informations.
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

--e89a8f642b06722b20051be31c48--