drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: [DISCUSSION] should BufferAllocator.buffer() throw an OutOfMemoryException ?
Date Tue, 30 Jun 2015 23:24:09 GMT
What would be the impact of making buffer() throw an
OutOfMemoryRuntimeException ? we'll need to change a few places where it
really needs to be handled (like in allocateNewSafe()) but other than that,
just letting the unchecked exception propagate should be fine, and Drill
will still display the proper error message.

Of course if this happens in the RPC layer, we still need to handle it
properly (DRILL-3317 <https://issues.apache.org/jira/browse/DRILL-3317>)

On Tue, Jun 30, 2015 at 4:12 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> -I like giving the developer the choice of behavior (thus the two options).
> It sounds like others are saying to remove this choice (because code is not
> managing it correctly).
> I might be misunderstanding the context above but my understanding is that
> the discussion is not particularly concerned about VVs nor adding, removing
> methods to/from them. It is the behavior of Allocator#buffer when we run
> OOM. Current behavior is to return null to communicate this. Given #buffer
> is used in very few classes, I think we can easily change this for good
> avoiding possible problems to come.
>
>
> - What I'm want to avoid is adding throws OutOfMemoryException in 1000
> places.
> Agreed.
>
>
>
> On Tue, Jun 30, 2015 at 3:54 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > buffer(int) is mostly called from value vectors, but there are an
> > additional 10+ places where it's being used and most of them don't bother
> > checking if the returned buffer is null.
> >
> > On Tue, Jun 30, 2015 at 3:39 PM, Jacques Nadeau <jacques@apache.org>
> > wrote:
> >
> > > What is your sense of the use of this interface? I would hope new uses
> > > would be scarce. Do you think this will be a frequent occurrence that
> > most
> > > devs will experience?
> > > On Jun 30, 2015 3:35 PM, "Abdel Hakim Deneche" <adeneche@maprtech.com>
> > > wrote:
> > >
> > > > I started this discussion while fixing DRILL-3312
> > > > <https://issues.apache.org/jira/browse/DRILL-3312>, it took me quite
> > > some
> > > > time to debug and find the root of an error I was seeing. I first
> > thought
> > > > about fixing those few places that call buffer() without checking if
> > the
> > > > buffer is null or not, but this won't prevent new code from repeating
> > the
> > > > same mistakes.
> > > >
> > > >
> > > > On Tue, Jun 30, 2015 at 3:19 PM, Jacques Nadeau <jacques@apache.org>
> > > > wrote:
> > > >
> > > > > Lots of different comments on this thread.  Here are few of my
> > > thoughts:
> > > > >
> > > > > I think you need to differentiate the different tiers of
> interfaces a
> > > > > little bit more.
> > > > >
> > > > > BufferAllocator.buffer() was meant to be a low level advanced
> > > interface.
> > > > > It should be used rarely and when done so, only with extreme care.
> > > Most
> > > > > devs shouldn't use it.  Having an advanced interface seems fine
> here.
> > > > This
> > > > > was how the interface was envisioned.   It looks like there are
> > > currently
> > > > > 16 classes that reference this code not including vectors.  (There
> > > > probably
> > > > > should be less, it looks like some code duplication contributes to
> > > this.)
> > > > >
> > > > > All the vector classes also access this code.  Vectors have two
> > > methods:
> > > > > allocateNew and allocateNewSafe.  allocateNew has an exception
> > behavior
> > > > and
> > > > > allocateNewSafe has a return value behavior.  These are main APIs
> > that
> > > > are
> > > > > used extensively throughout the code and the ones we should be most
> > > > focused
> > > > > on.  I like giving the developer the choice of behavior (thus the
> two
> > > > > options).  It sounds like others are saying to remove this choice
> > > > (because
> > > > > code is not managing it correctly).
> > > > >
> > > > > If there are bugs in the 16 classes on how they use the advanced
> > > > interface,
> > > > > this doesn't seem like a big challenge to correct rather than bulk
> > > > > modifying the API.  If people are using the wrong method on the
> > vector
> > > > > classes (or the vector classes aren't correctly behaving), that
> > doesn't
> > > > > seem like something we should address at the allocator API level.
> > > > >
> > > > > Hakim, can you be more specific about your concern?  It doesn't
> seem
> > > like
> > > > > we're talking about thousands of places here.
> > > > >
> > > > > Regarding checked versus unchecked...  I'm strongly in support of
> > using
> > > > > checked exceptions for most things.  However, I'm against it for
an
> > out
> > > > of
> > > > > memory condition.  (Note how the JVM also treats this as an
> unchecked
> > > > > exception.) It causes early catching when, in most cases, we really
> > > want
> > > > to
> > > > > bubble way up the stack and fail the query.  I know of very few
> > places
> > > > > where we can actually locally manage an out of memory condition so
> > why
> > > > > write huge amounts of code the bubble the exception all the way to
> > > where
> > > > we
> > > > > can actually handle it.
> > > > >
> > > > > A quick sidebar to Jason's "Hash agg out of memory" comment
> > > > > Two thoughts here.  I'm not sure this is a meaningful message.
> Where
> > > we
> > > > > run out of memory may have nothing to do with where we are using
> most
> > > > > memory.  And if we did to decide this was useful, this can be done
> > just
> > > > as
> > > > > easily with unchecked exceptions as checked exception.  What I'm
> want
> > > to
> > > > > avoid is adding throws OutOfMemoryException in 1000 places.  The
> > > reality
> > > > is
> > > > > everybody should assume that an OutOfMemory can occur pretty much
> > > > anytime.
> > > > >
> > > > > TL;DR
> > > > >
> > > > > In general, I think this discussion is too abstract and thus has
> the
> > > > danger
> > > > > of trending towards a religious discussion (I already see a bit of
> > that
> > > > > fervor in some of the responses.).  The number of places we are
> > talking
> > > > > about are not that large and are easy to find.  Let's solve this
by
> > > > looking
> > > > > at specifics.  For example, if people think checked would be
> better,
> > > > > someone should go through and make an example changeset for
> everyone
> > to
> > > > > review.  My guess is, the only that it works will be if very
> quickly,
> > > the
> > > > > handling code converts the checked exception into an unchecked
> > > > > UserException.  But I'm more than happy to be proven wrong.
> > > > >
> > > > > On Tue, Jun 30, 2015 at 2:16 PM, Daniel Barclay <
> > dbarclay@maprtech.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hanifi GUNES wrote:
> > > > > >
> > > > > >> - We would end up having to add it to almost everything
> everywhere
> > > > > >> Why would one propagate the checked exception for no reason?
And
> > why
> > > > > would
> > > > > >> one not propagate the exception for a good reason like
> > robustness? I
> > > > > agree
> > > > > >> that one has to avoid propagating the checked exception
for no
> > > reason
> > > > > >> however I support propagating it for a good reason.
> > > > > >>
> > > > > >> The added benefit of raising a checked exception is reminding
as
> > > well
> > > > as
> > > > > >> enforcing devs to handle it and be more cautious about this
> > > particular
> > > > > >> event. I find this compile-time safety check invaluable
for
> > > > robustness.
> > > > > >>
> > > > > >
> > > > > > +(1 times <some large number>)
> > > > > >
> > > > > > Daniel
> > > > > >
> > > > > >
> > > > > >
> > > > > >>
> > > > > >> - Or constantly wrapping it with RuntimeException to get
around
> > that
> > > > > >> If it has to be done, I would recommend relying on a helper
to
> do
> > > so.
> > > > > >> There
> > > > > >> is not much of man-work involved here.
> > > > > >>
> > > > > >>
> > > > > >> 2015-06-30 13:53 GMT-07:00 Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > >:
> > > > > >>
> > > > > >>  +1 to Hanifi's
> > > > > >>>
> > > > > >>> On Tue, Jun 30, 2015 at 1:38 PM, Jason Altekruse <
> > > > > >>> altekrusejason@gmail.com
> > > > > >>>
> > > > > >>>>
> > > > > >>>>  wrote:
> > > > > >>>
> > > > > >>>  +1 to Hanifi's comments, I think it makes much more
sense to
> > have
> > > a
> > > > > >>>>
> > > > > >>> number
> > > > > >>>
> > > > > >>>> of sites where the operators are explicitly catching
a checked
> > OOM
> > > > > >>>> exception and either decide to handle it or produce
a message
> > like
> > > > > "Hash
> > > > > >>>> Aggregate does not support our of memory conditions".
This
> would
> > > be
> > > > > >>>> particularly useful for debugging queries, as the
user
> exception
> > > can
> > > > > >>>> provide context information about the current operation.
This
> > way
> > > > > users
> > > > > >>>>
> > > > > >>> can
> > > > > >>>
> > > > > >>>> have some idea about the part of their query that
might be
> > causing
> > > > an
> > > > > >>>> excessive strain on system resources. I understand
that there
> > are
> > > > also
> > > > > >>>> cases where operators competing for memory can make
it a toss
> up
> > > to
> > > > > >>>> which
> > > > > >>>> will actually fail, but this would at least be a
step to give
> > more
> > > > > >>>>
> > > > > >>> detailed
> > > > > >>>
> > > > > >>>> information to users.
> > > > > >>>>
> > > > > >>>> On Tue, Jun 30, 2015 at 1:28 PM, Hanifi GUNES <hg@apache.org>
> > > > wrote:
> > > > > >>>>
> > > > > >>>>  I would propose throwing a checked exception encouraging
> > explicit
> > > > and
> > > > > >>>>> consistent handling of this event. Each sub-system
has
> liberty
> > to
> > > > > >>>>>
> > > > > >>>> decide
> > > > > >>>
> > > > > >>>> if
> > > > > >>>>
> > > > > >>>>> an OOM failure is fatal or non-fatal depending
on its
> > > capabilities.
> > > > > >>>>>
> > > > > >>>> Also
> > > > > >>>
> > > > > >>>> if
> > > > > >>>>
> > > > > >>>>> at some point a sub-system needs to communicate
with its
> > callers
> > > > via
> > > > > a
> > > > > >>>>> different mechanism such like using flags (boolean,
enum etc)
> > or
> > > > > >>>>>
> > > > > >>>> raising
> > > > > >>>
> > > > > >>>> an
> > > > > >>>>
> > > > > >>>>> unchecked exception that's still fine, just
handle the
> > exception.
> > > > If
> > > > > >>>>>
> > > > > >>>> there
> > > > > >>>>
> > > > > >>>>> is a need to suppress the checked exception
that's fine too,
> > just
> > > > > use a
> > > > > >>>>> helper method.
> > > > > >>>>>
> > > > > >>>>> Either way, returning *null* sounds problematic
in many ways
> i)
> > > it
> > > > is
> > > > > >>>>> implicit ii) unsafe iii) its handling logic
is repetitive iv)
> > it
> > > is
> > > > > >>>>> semantically unclean to make null mean something
- even worse
> > > > > something
> > > > > >>>>> context specific.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> -Hanifi
> > > > > >>>>>
> > > > > >>>>> 2015-06-30 12:23 GMT-07:00 Abdel Hakim Deneche
<
> > > > > adeneche@maprtech.com
> > > > > >>>>>
> > > > > >>>> :
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>>  I guess that would fix the issue too. But we
may still run
> > into
> > > > > >>>>>>
> > > > > >>>>> situations
> > > > > >>>>>
> > > > > >>>>>> where the caller will pass a flag to "mute"
the exception
> and
> > > not
> > > > > >>>>>>
> > > > > >>>>> handle
> > > > > >>>>
> > > > > >>>>> the case anyway.
> > > > > >>>>>>
> > > > > >>>>>> If .buffer() unconditionally throws an exception,
can't the
> > > > caller,
> > > > > >>>>>>
> > > > > >>>>> who
> > > > > >>>
> > > > > >>>> wants to, just catch that and handle it properly
?
> > > > > >>>>>>
> > > > > >>>>>> On Tue, Jun 30, 2015 at 12:13 PM, Chris
Westin <
> > > > > >>>>>>
> > > > > >>>>> chriswestin42@gmail.com>
> > > > > >>>>
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>  No, but we should do something close to
that.
> > > > > >>>>>>>
> > > > > >>>>>>> There are cases where the caller can
handle the inability
> to
> > > get
> > > > > >>>>>>>
> > > > > >>>>>> more
> > > > > >>>
> > > > > >>>> memory, and may be able to go to disk. However,
you are
> correct
> > > > > >>>>>>>
> > > > > >>>>>> that
> > > > > >>>
> > > > > >>>> there
> > > > > >>>>>>
> > > > > >>>>>>> are many that can't handle an OOM, and
that fail to check.
> > > > > >>>>>>>
> > > > > >>>>>>> Instead of unconditionally throwing
> > > OutOfMemoryRuntimeException,
> > > > I
> > > > > >>>>>>>
> > > > > >>>>>> would
> > > > > >>>>>
> > > > > >>>>>> suggest that the buffer() call take a flag
that indicates
> > > whether
> > > > > >>>>>>>
> > > > > >>>>>> or
> > > > > >>>
> > > > > >>>> not
> > > > > >>>>>
> > > > > >>>>>> it
> > > > > >>>>>>
> > > > > >>>>>>> should throw if it is unable to fulfill
the request. This
> > way,
> > > > the
> > > > > >>>>>>>
> > > > > >>>>>> call
> > > > > >>>>
> > > > > >>>>> sites that can handle an OOM can pass in the
flag to return
> > null,
> > > > > >>>>>>>
> > > > > >>>>>> and
> > > > > >>>
> > > > > >>>> the
> > > > > >>>>>
> > > > > >>>>>> rest can pass in the flag value to throw,
and not have to
> have
> > > any
> > > > > >>>>>>>
> > > > > >>>>>> checking
> > > > > >>>>>>
> > > > > >>>>>>> code.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Tue, Jun 30, 2015 at 12:06 PM, Abdel
Hakim Deneche <
> > > > > >>>>>>> adeneche@maprtech.com
> > > > > >>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>  our current implementations of BufferAllocator.buffer(int,
> > > int)
> > > > > >>>>>>>>
> > > > > >>>>>>> returns
> > > > > >>>>>
> > > > > >>>>>> null when it cannot allocate the buffer.
> > > > > >>>>>>>>
> > > > > >>>>>>>> But looking through the code, there
are many places that
> > don't
> > > > > >>>>>>>>
> > > > > >>>>>>> check
> > > > > >>>>
> > > > > >>>>> if
> > > > > >>>>>
> > > > > >>>>>> the
> > > > > >>>>>>>
> > > > > >>>>>>>> allocated buffer is null before
trying to access it which
> > will
> > > > > >>>>>>>>
> > > > > >>>>>>> throw
> > > > > >>>>
> > > > > >>>>> a
> > > > > >>>>>
> > > > > >>>>>> NullPointerException.
> > > > > >>>>>>>>
> > > > > >>>>>>>> ValueVectors' allocateNewSafe()
seem to be the only place
> > that
> > > > > >>>>>>>>
> > > > > >>>>>>> handle
> > > > > >>>>
> > > > > >>>>> the
> > > > > >>>>>>
> > > > > >>>>>>> null in a specific manner.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Should we update the allocators'
implementation to throw
> an
> > > > > >>>>>>>> OutOfMemoryRuntimeException instead
of returning null ?
> this
> > > has
> > > > > >>>>>>>>
> > > > > >>>>>>> the
> > > > > >>>>
> > > > > >>>>> added
> > > > > >>>>>>>
> > > > > >>>>>>>> benefit of displaying a proper out
of memory error message
> > to
> > > > the
> > > > > >>>>>>>>
> > > > > >>>>>>> user.
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>>>> Thanks!
> > > > > >>>>>>>>
> > > > > >>>>>>>> --
> > > > > >>>>>>>>
> > > > > >>>>>>>> Abdelhakim Deneche
> > > > > >>>>>>>>
> > > > > >>>>>>>> Software Engineer
> > > > > >>>>>>>>
> > > > > >>>>>>>>    <http://www.mapr.com/>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Now Available - Free Hadoop On-Demand
Training
> > > > > >>>>>>>> <
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>>
> > > > > >>>>>> Abdelhakim Deneche
> > > > > >>>>>>
> > > > > >>>>>> Software Engineer
> > > > > >>>>>>
> > > > > >>>>>>    <http://www.mapr.com/>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Now Available - Free Hadoop On-Demand Training
> > > > > >>>>>> <
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> Abdelhakim Deneche
> > > > > >>>
> > > > > >>> Software Engineer
> > > > > >>>
> > > > > >>>    <http://www.mapr.com/>
> > > > > >>>
> > > > > >>>
> > > > > >>> Now Available - Free Hadoop On-Demand Training
> > > > > >>> <
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > > > --
> > > > > > Daniel Barclay
> > > > > > MapR Technologies
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message