corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <pmke...@apache.org>
Subject Re: Checking malloc success and adding perror()
Date Tue, 24 Feb 2015 17:35:31 GMT
Here’s a fun fact: On Linux and OS X, malloc never actually returns NULL.

Out of curiosity, I thought I’d try the following program on Linux:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int count = 0;
    while (1) {
        void *p = malloc(1024*1024);
        if (p == NULL) {
            printf("malloc returned NULL");
            return 1;
        }
        else {
            count++;
            printf("count = %d\n",count);
        }
    }
}

On a 64-bit Ubuntu VM with 2GB of memory it allowed be to “allocate" a total of 569 Gb of
memory. When allocation eventually failed, malloc didn’t return anything - the kernel simply
terminated the process.

Then onto OS X. My Laptop has 16Gb of memory, but it was quite happy to hand out more than
150 Gb. It would have kept going but it showed no sign of reaching a limit (but allocations
were getting slower), so I stopped it.

What the program actually has are *pages*, not actual physical memory. The process ends up
with large amounts of address space available, but memory is only ever *actually* allocated
when we try to write to it.

If I modify the code to actually write to all of the allocated memory, on the Linux VM I got
to 3.3 Gb, noticeably slowing down around the 2 Gb mark when it started hitting the page file.
After 3.3 Gb, the kernel once again killed the process. On my Mac, it kept going for a lot
longer but I could see my available disk space dropping, due to it clearly utilising the page
file.

Windows did the right thing, eventually returning NULL after 1.8 Gb (surprisingly, this was
on a VM with 4 Gb total allocated, I would have expected a bit more).

So as far as Linux and OS X, any attempts we make to try and check for and deal with a NULL
return value from malloc are entirely pointless, as the process will be killed before it’s
even able to get to the error handling code.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 24 Feb 2015, at 11:04 pm, Dennis E. Hamilton <dennis.hamilton@acm.org> wrote:
> 
> I find this an interesting discussion although I think it would be useful to separate
it, and the architectural issues being raised, from the proposal that Gabriela has made.
> 
> The current situation is that there is no (or not enough) checking for failed mallocs.
 There is a proposal to deal with that in the short-term by simply replacing those calls with
code that does check and uses a mindless error-reporting method for all of them.  That work
is now going ahead.
> 
> It is clear that this is not a production-quality fix.  It replaces a NULL-pointer usage
crasher with something that at least explains what the crash is at a place closer to where
the NULL-pointer arises.  And there is nothing like a graceful failure being provided.
> 
> This is a stop-gap, an useful one for what it accomplishes.  Providing a production-quality
result for software that end-users will be employing is going to be quite a different matter,
and we are probably looking at it at too much of a micro-level.  
> 
> It would be good to stand back and look at exactly what behavior we do want.  What do
we want software that uses the libraries being offered to be able to do in then event that
there is a resource-exhaustion situation detected in underlying code and what do we want to
assure about the resulting state that accompanies the reporting of such a situation.  (I.e.,
is there anything recoverable by the software that uses the library, are there likely memory
leaks, etc.)
> 
> Another question is whether or not we are willing to make a release that employs the
stop-gap, and how do we make that known to potential adopters of the code.  Will we declare
it alpha- or beta-level quality, or what?
> 
> - Dennis
> 
> -----Original Message-----
> From: Edward Zimmermann [mailto:Edward.Zimmermann@cib.de] 
> Sent: Tuesday, February 24, 2015 04:08
> To: dev@corinthia.incubator.apache.org
> Subject: RE: Checking malloc success and adding perror()
> 
> Answers mixed in...
> 
>> Von: jan i [mailto:jani@apache.org]
>> Gesendet: Montag, 23. Februar 2015 20:25
>> An: dev@corinthia.incubator.apache.org
>> Betreff: Re: Checking malloc success and adding perror()
>> 
>> On 23 February 2015 at 12:47, Edward Zimmermann
>> <Edward.Zimmermann@cib.de>
>> wrote:
>> 
>>> Hi,
>>> 
>>> Been sort-of out the the discussion-- was on vacation last week-- so
>>> excuse me, in advance, if I bring up a point already made.
>>> 
>> hi again, nice to hear from you again.
>> 
>> 
>>> 
>>> First of all.. Corthinia is supposed to be C++? If so we don't want
>> to
>>> use malloc. If its plain C, of course, malloc is probably our first
>>> choice for memory allocation.
>>> 
>> No DocFormats (the library part) is strictly C99. The application of
>> top can be other languages.
>> 
> 
> Sure. SWIG....
> 
>> 
>>> 
>>> With the issue of x= malloc(y). This gets more complicated. Linux,
>>> https://access.redhat.com/documentation/en-
>> US/Red_Hat_Enterprise_Linux
>>> /6/html/Performance_Tuning_Guide/s-memory-captun.html
>>> 
>> we could, but I do not think we want to inerfere wih kernel parameters.
>> 
> 
> I was not suggesting that we muck with the kernel params... but need to accept the behavior.
> Android, for example, sets, I think, by standard the value to 1. That means that malloc
will return a pointer even when there is absolutely no RAM available-- and if the pointer
space is depleted (recall most Androids are 32-bit kernels) unleashes immediately the OOM
killer.
> 
> 
> 
>> 
>>> 
>>> Can we configure the OOM Killer to be a little nicer? Yes but really
>>> only process by process.. And when a process gets killed it's done so
>>> in a very silent way.. so if someone is "not in the know" to spot the
>>> clues.. it can get quite mysterious why some programs stop working...
>>> 
>>> But let us even pretend that we get a NULL... What is the watermark
>> to
>>> have gotten it? Can we recover? Any buffers we can quickly dispose
>> of?
>>> Recovery is not that easy!
>>> 
>> You are some steps ahead, right now we replace malloc with xmalloc to
>> have a central place for error handling.
>> 
> 
> I'm arguing that a central place is the wrong place. Android, for example, won't get
there. 
> 
> What is wrong with having different allocate functions for the allocation of different
kinds of objects?
> What is wrong with having a bit of checking business logic in these functions?
> Business logic? A function, for example, that wants to create a scratch buffer might
be smart and only create a buffer suitable to the amount of free RAM around. Another object
creator may want to have a pool.. and another still just call malloc. 
> When calling malloc is code such as
> 
> if (((Ptr = (t *)malloc(size)) == NULL) {
>  /* do error handling? */
> }
> 
> Really polluted?
> Don't we want to be able to distinguish between recoverable and non-recoverable errors?
> 
> 
> 
>> 
>>> 
>>> Should we pretend that we can get a NULL? Of course. It's good
>>> programming practice. Should we wrap malloc with an xmalloc for such
>>> testing? No. On systems where malloc might  return a NULL we should
>>> have for different objects alternative strategies for dealing with an
>>> allocation failure. A routine, for example, that wants to create a
>>> scratch buffer of x length but could work, albeit slower, with less
>> we might make smaller. Etc etc. etc.
>>> 
>> of course we need to check for that, try on a Android or IoS system to
>> load a huge documents, and you will most surely reach the limit.
> 
> Sure. Lacking virtual memory--- but having a virtual memory arch....
> That is why we use mmap.
> 
>> 
>> in windows malloc only works well when you allocate in chunks of 4k
>> (NTFS size). But both in windows and linux calling both malloc and free
>> will cause a context switch.
> 
> The Microsoft Low Fragmentation Heap allocator works really well with sub 4k objects---
it is limited to chunks under 16k.
> 
> http://illmatics.com/Understanding_the_LFH.pdf
> 
>> 
>> IoS and Andriod work with preemptive context switching so here it is
>> even more expensive.
>> 
> ??? Android is Linux. When we run a native C code it's call libc (BIONIC, a BSD-derived
variant). For malloc they use Doug Lea's malloc.
> 
> 
> 
>> 
>>> 
>>> I'd suggest we keep to malloc and IF NEEDED-- and only if and when
>>> needed--- we use a drop-in replacement (and chances are that we'll
>>> NEVER need it much less want one).
>>> 
>> Right now we want to replace the malloc calls with xmalloc, so we do
>> not need tons of "if NULL" distributed in the code. Replacing malloc is
>> a second discussion.
>> 
> 
> What is wrong with if NULL? 
> 
>> 
>>> 
>>> Part of the problem is that we might have different "best" approaches
>>> for different operating systems. IOS, Android, Linux, BSD, ... making
>> "best"
>>> not really the "best" goal..
>>> 
>> Well with my kernel knowledge, all of them benefit from us allocating a
>> chunk during startup, eating more if needed, and freeing it all when we
>> are finished. Please remember the typical use will be open a docment,
>> do something, save a document and stop the application.
>> 
> 
> This is an old discussion that has waged for decades. Designing software that does not
free memory I think would be a mistake especially when speed is not the ultimate issue. 
> 
> 
>> 
>>> 
>>> Perror?  No.  Calling directly a function that is intended to write
>> to
>>> a console is, in general, a bad thing.
>>> 
>> Why do you see that as bad ? I thought it wrote to stderr which can be
>> redirected, but anyhow what do you suggest instead.
> 
> Because it is not nice writing to a console. Sure you can redirect stderr but how do
you know if a 3rd party lib that gets used alongside the code might too think about redirecting
stderr to yet another place or even closing it.. I'm seen all too often things go really wrong.
It's just not best practice-- I'd even call it "bad practice" in a shared library. Libraries
should return error codes and perhaps have a message sub-system-- or an interface to one.
Stderr is perhaps a tad less ugly than stdout.. BUT we really really should NOT be using either..
> 
> 
> 
>> 
>> rgds
>> jan i.
>> 
>>> 
>>> 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Peter Kelly [mailto:pmkelly@apache.org]
>>> Gesendet: Donnerstag, 19. Februar 2015 13:41
>>> An: dev@corinthia.incubator.apache.org
>>> Betreff: Re: Checking malloc success and adding perror()
>>> 
>>>> On 19 Feb 2015, at 7:06 pm, Dennis E. Hamilton
>>>> <dennis.hamilton@acm.org>
>>> wrote:
>>>> 
>>>> +1 about a cheap check and common abort procedure for starters.
>>>> 
>>>> I think figuring out what to do about cleanup and exception
>>>> unwinding,
>>> and even what exception handling to use (if any) is a further
>>> platform-development issue that could be masked with simple
>>> still-inlineable code, but needs much more architectural thought.
>>> 
>>> I’m fine with us using wrapper functions for these which do the
>> checks
>>> - though please let’s use xmalloc, xcalloc, xrealloc, and xstrdup
>>> instead of
>>> DFPlatform* (it’s nothing to do with platform abstraction, and these
>>> names are easier to type). (as a side note we can probably cut down
>> on
>>> prefix usage a lot as long as we don’t export symbols; this was just
>>> to avoid name clashes with other libraries)
>>> 
>>> In my previous mail I really just wanted to point out that by itself,
>>> this doesn’t really solve anything - the issue is in reality far more
>>> complicated than a simple NULL pointer check.
>>> 
>>> I can think of two ways we could deal with the issue of graceful
>> handling:
>>> 
>>> 1) Allow the application to supply a callback, as Jan suggested
>>> 
>>> 2) Adopt a “memory pool” type strategy where we create an memory pool
>>> object at the start of conversion which tracks all allocations that
>>> occur between the beginning and end of a top-level API call like
>>> DFGet, and put setjmp/longjmp-style exception handling in these API
>> calls.
>>> 
>>> The second approach is in fact already used to a limited extent with
>>> the DOM API. Every document maintains its own memory pool for storing
>>> Node objects (and the text values of nodes)… this is freed when the
>>> document’s retainCount drops to zero. I did this because it was much
>>> faster than traversing through the tree and releasing nodes
>>> individually (at least in comparison to have nodes as Objective C
>>> objects - the ObjC runtime was undoubtedly part of that overhead).
>>> 
>>> —
>>> Dr Peter M. Kelly
>>> pmkelly@apache.org
>>> 
>>> PGP key: http://www.kellypmk.net/pgp-key
>>> <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F
>>> BFA0 5E46 2523 BAA1 44AE 2966)
>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message