drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Suspicious direct memory consumption when running queries concurrently
Date Sun, 02 Aug 2015 16:23:26 GMT
I should be able to do that, Vicky was able to reproduce it with a single
query run concurrently. Let me try to find it

On Sun, Aug 2, 2015 at 9:11 AM, Jacques Nadeau <jacques@dremio.com> wrote:

> If you give me 5 sample queries,  a simple harness should be easy to
> create.
> On Aug 2, 2015 9:06 AM, "Abdel Hakim Deneche" <adeneche@maprtech.com>
> wrote:
>
> > @yulia,
> > Just checked the available disk space and there is more than enough for
> the
> > dump :(
> >
> > @Jacques,
> > You'll need the test framework to be able to reproduce this. I'm already
> > using one single node and it's just a matter of running a bunch of window
> > function queries concurrently and repeat this a lot.
> >
> > Looking at the memory growth it seems to become stable after some time,
> > which may suggest it's not a memory leak, I still have 3 questions I will
> > try to find answers for:
> > - why Netty doesn't release memory chunks when no queries are running (up
> > to 5GB if you run enough iterations)
> > - are all those allocated chunks being used when you run one more
> > iteration, or does Netty only use some of them and leave the rest
> allocated
> > for no reason ? (I should be able to get this from the memory logs I
> > already have)
> > - is this an "expected" behavior of Netty's allocator and we should just
> > learn to live with it ?
> >
> >
> > On Fri, Jul 31, 2015 at 10:40 PM, yuliya Feldman <
> > yufeldman@yahoo.com.invalid> wrote:
> >
> > > How much memory your jvm is taking?
> > > Do you even have enough disk space to dump it.
> > >       From: Abdel Hakim Deneche <adeneche@maprtech.com>
> > >  To: "dev@drill.apache.org" <dev@drill.apache.org>
> > >  Sent: Friday, July 31, 2015 9:19 PM
> > >  Subject: Re: Suspicious direct memory consumption when running queries
> > > concurrently
> > >
> > > I tried getting a jmap dump multiple times without success, each time
> it
> > > crashes the jvm with the following exception:
> > >
> > > Dumping heap to
> /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> > > > ...
> > > > Exception in thread "main" java.io.IOException: Premature EOF
> > > >        at
> > > >
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
> > > >        at
> > > >
> > >
> >
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
> > > >        at
> > > >
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
> > > >        at
> > > >
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
> > > >        at sun.tools.jmap.JMap.dump(JMap.java:242)
> > > >        at sun.tools.jmap.JMap.main(JMap.java:140)
> > >
> > >
> > > On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau <jacques@dremio.com>
> > > wrote:
> > >
> > > > A allocate -> release cycle all on the same thread goes into a per
> > thread
> > > > cache.
> > > >
> > > > A bunch of Netty arena settings are configurable.  The big issue I
> > > believe
> > > > is that the limits are soft limits implemented by the allocation-time
> > > > release mechanism.  As such, if you allocate a bunch of memory, then
> > > > release it all, that won't necessarily trigger any actual chunk
> > releases.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com
> > > > > wrote:
> > > >
> > > > > @Jacques, my understanding is that chunks are not owned by
> specific a
> > > > > thread but they are part of a specific memory arena which is in
> turn
> > > only
> > > > > accessed by specific threads. Do you want me to find which threads
> > are
> > > > > associated with the same arena where we have hanging chunks ?
> > > > >
> > > > >
> > > > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <
> jacques@dremio.com
> > >
> > > > > wrote:
> > > > >
> > > > > > It sounds like your statement is that we're cacheing too many
> > unused
> > > > > > chunks.  Hanifi and I previously discussed implementing a
> separate
> > > > > flushing
> > > > > > mechanism to release unallocated chunks that are hanging around.
> > The
> > > > > main
> > > > > > question is, why are so many chunks hanging around and what
> threads
> > > are
> > > > > > they associated with.  A Jmap dump and analysis should allow
you
> to
> > > do
> > > > > > determine which thread owns the excess chunks.  My guess would
be
> > the
> > > > RPC
> > > > > > pool since those are long lasting (as opposed to the WorkManager
> > > pool,
> > > > > > which is contracting).
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > > > > adeneche@maprtech.com>
> > > > > > wrote:
> > > > > >
> > > > > > > When running a set of, mostly window function, queries
> > concurrently
> > > > on
> > > > > a
> > > > > > > single drillbit with a 8GB max direct memory. We are seeing
a
> > > > > continuous
> > > > > > > increase of direct memory allocation.
> > > > > > >
> > > > > > > We repeat the following steps multiple times:
> > > > > > > - we launch in "iteration" of tests that will run all queries
> in
> > a
> > > > > random
> > > > > > > order, 10 queries at a time
> > > > > > > - after the iteration finishes, we wait for a couple of
minute
> to
> > > > give
> > > > > > > Drill time to release the memory being held by the finishing
> > > > fragments
> > > > > > >
> > > > > > > Using Drill's memory logger ("drill.allocator") we were
able to
> > get
> > > > > > > snapshots of how memory was internally used by Netty, we
only
> > > focused
> > > > > on
> > > > > > > the number of allocated chunks, if we take this number
and
> > multiply
> > > > it
> > > > > by
> > > > > > > 16MB (netty's chunk size) we get approximately the same
value
> > > > reported
> > > > > by
> > > > > > > Drill's direct memory allocation.
> > > > > > > Here is a graph that shows the evolution of the number
of
> > allocated
> > > > > > chunks
> > > > > > > on a 500 iterations run (I'm working on improving the plots)
:
> > > > > > >
> > > > > > > http://bit.ly/1JL6Kp3
> > > > > > >
> > > > > > > In this specific case, after the first iteration Drill
was
> > > allocating
> > > > > > ~2GB
> > > > > > > of direct memory, this number kept rising after each iteration
> to
> > > > ~6GB.
> > > > > > We
> > > > > > > suspect this caused one of our previous runs to crash the
JVM.
> > > > > > >
> > > > > > > If we only focus on the log lines between iterations (when
> > Drill's
> > > > > memory
> > > > > > > usage is below 10MB) then all allocated chunks are at most
2%
> > > usage.
> > > > At
> > > > > > > some point we end up with 288 nearly empty chunks, yet
the next
> > > > > iteration
> > > > > > > will cause more chunks to be allocated!!!
> > > > > > >
> > > > > > > is this expected ?
> > > > > > >
> > > > > > > PS: I am running more tests and will update this thread
with
> more
> > > > > > > informations.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Abdelhakim Deneche
> > > > > > >
> > > > > > > Software Engineer
> > > > > > >
> > > > > > >  <http://www.mapr.com/>
> > > > > > >
> > > > > > >
> > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >  <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message