drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: OOM : Direct buffer memory
Date Fri, 13 May 2016 23:35:52 GMT
1. you are right, the root allocator should prevent an allocation that
exceeds the total available memory, but I'm not sure if all allocations in
the rpc layer go through Drill's accountor. Also Netty internal
fragmentation could cause this issue even though we are still below our
memory limit.
2. unfortunately, when a channel is closed we don't get back most of the
acknowledgements for the messages that were sent through that channel, and
we are forced to fail any query that's still waiting for an ack from that
channel. The more queries are running in parallel, the more chances a large
number of them will be affected by this.

On Fri, May 13, 2016 at 4:22 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> 1. This looks like a bug with the allocator unless there is a reason for
> not enforcing a limit(total direct memory available) on the memory
> allocated to all the fragments
> 2. This looks like a bigger problem as we are unnecessarily failing all the
> other queries as a result of one fragment causing OOM. It makes sense if
> the drillbit was un-responsive after a fragment hit an OOM. But I was able
> to connect to that specific drillbit after the failures and ran the same
> failing queries successfully.
>
> - Rahul
>
> On Fri, May 13, 2016 at 4:06 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > 1. you are getting this error because the Drillbit is running out of
> direct
> > memory. It's thrown by Netty when it couldn't allocate a new chunk of
> > direct memory from the system. I know for each query, the allocator will
> > enforce the query's limit. But I'm not sure we actually properly compute
> > those limits to not exceed the total direct memory limit.
> > 2. when we hit a channel closed exception, all fragments that were
> > transmitting on that channel will most likely fail even though they
> didn't
> > run out of memory. It's hard to tell where the memory went without more
> > information about the queries you were trying to run
> >
> > On Fri, May 13, 2016 at 3:45 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Drillers,
> > >
> > > I was executing 20 queries using 10 concurrent clients on an 8 node
> > > cluster. First 10 queries succeed and the remaining 10 queries fail
> with
> > > "ChannelClosedException". The logs suggested that all the fragments
> > running
> > > on one node hit an "java.lang.OutOfMemoryError: Direct buffer memory".
> 2
> > > questions here.
> > >    1. Can someone explain why we are even seeing this error. Shouldn't
> > the
> > > allocator detect this condition upfront?
> > >    2. Why did all the fragments fail. Where did the memory go?
> > >
> > > - Rahul
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message