mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benno Evers <bev...@mesosphere.com>
Subject Re: [Proposal] Use jemalloc as default memory allocator for Mesos
Date Thu, 21 Sep 2017 00:49:56 GMT
Looks like we can agree that it's a useful feature to add, which is great!
:)

I've written up the first version of a Design Document, please feel free to
take a look and comment:


https://docs.google.com/document/d/1qCVK40nOKDWlLKYrYUpLDwkH654zCoatIzi4OnsW8ac/edit?usp=sharing

Thanks,
Benno

On Tue, Sep 5, 2017 at 6:24 PM, Jeff Coffler <
Jeff.Coffler@microsoft.com.invalid> wrote:

> We're talking with the jemalloc folks.
>
> They did have a plan to move to cmake, but they wanted to move to cmake
> for all platforms simultaneously (a tall order). My thoughts are to ask
> them to move to cmake (we'll even do the work here if they'll take 'em) for
> Windows only. This is manageable (although they would be using two build
> systems - like Mesos DEVs don't know all about that), but jemalloc already
> does that since they've checked in some Visual Studio solutions.
>
> In the past, I've just asked the company to buy what I need (i.e.
> Smartheap), and it's worked splendidly. But that's not an option for Mesos,
> as you pointed out.
>
> I'll talk with JohnK this morning (he's managing this), and will see where
> he is in discussions with jemalloc authors. Last week, we were talking with
> the guy that did a lot of the cmake porting, but we need to talk with the
> jemalloc author directly.
>
> Given the situation, I guess we should just move with jemalloc. If worse
> comes to worse, we'll maintain our own cmake build process for jemalloc
> against a specific branch. We already maintain local changes for some other
> 3rd party dependencies.
>
> /Jeff
>
> -----Original Message-----
> From: Benno Evers [mailto:bevers@mesosphere.com]
> Sent: Tuesday, September 5, 2017 4:02 AM
> To: dev@mesos.apache.org
> Subject: Re: [Proposal] Use jemalloc as default memory allocator for Mesos
>
> Hi Jeff,
>
> do you have a particular alternative in mind? Certainly SmartHeap is a
> non-starter because it is proprietary. I actually did look for alternatives
> before sending out this proposal, but from what I've seen, there isn't
> exactly an abundance of well-tested, widely used and stable malloc
> implementations with heap profiling features, i.e. I'm not aware of any
> besides tcmalloc and jemalloc.
>
> I also don't think having a native windows build is as important as being
> well-tested, because either the build just works (perfect), or it doesn't
> work and the feature will be disabled in windows, then people are in
> exactly the same position as they were before, in particular they can still
> use windows-native heap profiling solutions if they want. On the other
> hand, if we decide on some obscure malloc, there is a much higher chance of
> accidentally introducing subtle bugs for the people who enable the feature.
>
> Best regards,
> Benno
>
>
> On Thu, Aug 31, 2017 at 5:41 PM, Jeff Coffler < Jeff.Coffler@microsoft.com
> .invalid> wrote:
>
> > The fact that Firefox works with jemalloc isn't necessarily
> > indicative. I, for one, would like to avoid dependencies on Cygwin for
> > Mesos. We don't need it today, and we're building an awful lot.
> >
> > (Interesting that you brought up SASL-based auth - that's currently in
> > the process of being ported - natively - to Windows.)
> >
> > There are many options for memory allocators that run both on Linux
> > and Windows. For example, I've used SmartHeap in the past, and that
> > works well on UNIX, Linux, Windows, and more. (That's commercial; I'm
> > not sure it's free for open source products or not.) I'm not
> > necessarily suggesting SmartHeap, I'm just pointing out that there are
> > native options for both Linux and Windows that are well ported and work
> everywhere.
> >
> > Before we decide on using jemalloc, I'd like to see someone look at
> > memory allocators and see if there's one we can use (i.e. free) that's
> > natively supported both on Linux and Windows without jumping through
> > hoops (like Cygwin for builds). If there are native choices, we should
> > look at those much more aggressively than an option that doesn't work
> well on Windows.
> >
> > /Jeff
> >
> > -----Original Message-----
> > From: Till Toenshoff [mailto:toenshoff@me.com]
> > Sent: Wednesday, August 30, 2017 6:27 PM
> > To: dev@mesos.apache.org
> > Subject: Re: [Proposal] Use jemalloc as default memory allocator for
> > Mesos
> >
> > It appears that jemalloc does support Windows (64bit)
> > See: https://na01.safelinks.protection.outlook.com/?url=
> > https%3A%2F%2Fgithub.com%2Fjemalloc%2Fjemalloc%2Fblob%
> > 2Fdev%2Fmsvc%2FReadMe.txt&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011
> > db47%7C1%7C0%7C636397396315954606&sdata=e6RrlOXc%
> > 2B9BAY0FwBx3UMKElg3S5SCgZXKVYKGSAfQE%3D&reserved=0 <
> > https://na01.safelinks.protection.outlook.com/?url=
> > https%3A%2F%2Fgithub.com%2Fjemalloc%2Fjemalloc%2Fblob%
> > 2Fdev%2Fmsvc%2FReadMe.txt&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011
> > db47%7C1%7C0%7C636397396315954606&sdata=e6RrlOXc%
> > 2B9BAY0FwBx3UMKElg3S5SCgZXKVYKGSAfQE%3D&reserved=0>
> >
> > tcmalloc on the other hand appears to only offer a minimal variant on
> > Windows.
> > See: https://na01.safelinks.protection.outlook.com/?url=
> > https%3A%2F%2Fgithub.com%2Fgperftools%2Fgperftools&
> > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1cb6c56ba96f401a8d6108d4
> > f00f 660e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> > 7C636397396315954606&sdata=rz1OflD81bHoGkSpw7JinsVKLjpGK9
> > Lso546GnfP5L8%3D&reserved=0 <https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fna01.safelinks&
> data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1841d784c6234e2b646108d4f44d
> 989a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636402061497535048&sdata=c2U95Fa5kVlc7w3qn8MJ7mB3aJuZtm
> kWgvjn8vJzo1E%3D&reserved=0.
> > protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> > 2Fgperftools%2Fgperftools&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011
> > db47%7C1%7C0%7C636397396315954606&sdata=rz1OflD81bHoGkSpw7JinsVKLjpGK9
> > Lso546GnfP5L8%3D&reserved=0> - grep for "COMPILING ON NON-LINUX SYSTEMS”
> > See: https://na01.safelinks.protection.outlook.com/?url=
> > https%3A%2F%2Fgithub.com%2Fgperftools%2Fgperftools%
> > 2Fblob%2Fmaster%2FINSTALL&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011
> > db47%7C1%7C0%7C636397396315954606&sdata=WcU72HYzVMao7yF7LJ7Ks%
> > 2Bdv0P7zhjE%2BM6cKIOfa488%3D&reserved=0 <https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fna01.safelinks&
> data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1841d784c6234e2b646108d4f44d
> 989a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636402061497535048&sdata=c2U95Fa5kVlc7w3qn8MJ7mB3aJuZtm
> kWgvjn8vJzo1E%3D&reserved=0.
> > protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> > 2Fgperftools%2Fgperftools%2Fblob%2Fmaster%2FINSTALL&
> > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1cb6c56ba96f401a8d6108d4
> > f00f 660e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> > 7C636397396315954606&sdata=WcU72HYzVMao7yF7LJ7Ks%
> > 2Bdv0P7zhjE%2BM6cKIOfa488%3D&reserved=0> - grep for “Windows  (MSVC,
> > Cygwin, and MinGW)"
> >
> > So both options rely on Cygwin or MinGW for building - a requirement
> > that the Mesos build itself does not have and proves your point of
> > stuff not really “just working” at least when it comes to the build
> > step of those packages.
> >
> > Seems that without trying it, we won’t find out if jemalloc works as
> > hoped on Windows for us - the Firefox project results however are
> > encouraging. On the other hand, if it doesn’t work, we could simply
> > decide to disable it on Windows just like some other Mesos features
> > will remain disabled on that platform unless someone decides to port
> them -  e.g. SASL based authn.
> >
> > > On Aug 25, 2017, at 3:28 PM, Benno Evers <bevers@mesosphere.com>
> wrote:
> > >
> > > Hi Jeff,
> > >
> > > from looking around on the internet, it seems like Firefox builds
> > > with jemalloc on all platforms, and I've also seen reports of people
> > > successfully using tcmalloc heap profiling on windows. I'm afraid I
> > > don't currently have a Windows machine with development environment
> > > set up, so I can't provide direct user experience.
> > >
> > > In the worst case, we would have to disable jemalloc-support on
> > > windows, but I hope it won't be necessary.
> > >
> > > Since you probably have more experience with memory management on
> > > windows, is there a reason to suspect that it should or shouldn't work?
> > >
> > > Best regards,
> > > Benno
> > >
> > > On Wed, Aug 23, 2017 at 6:16 PM, Jeff Coffler <
> > > Jeff.Coffler@microsoft.com.invalid> wrote:
> > >
> > >> Hi Benno,
> > >>
> > >> What's the availability of both jemalloc and tcmalloc on the
> > >> Windows platform? Do the products work there properly?
> > >>
> > >> There are solutions that I know work on Windows (from past work
> > >> I've done). I'm unsure about either jemalloc and tcmalloc, however.
> > >>
> > >> Thanks,
> > >>
> > >> /Jeff
> > >>
> > >> -----Original Message-----
> > >> From: Benno Evers [mailto:bevers@mesosphere.com]
> > >> Sent: Tuesday, August 22, 2017 3:16 AM
> > >> To: dev@mesos.apache.org
> > >> Subject: Re: [Proposal] Use jemalloc as default memory allocator
> > >> for Mesos
> > >>
> > >> Hi Alexander,
> > >>
> > >> in general, jemalloc and tcmalloc are very similar, and seem to be
> > >> taking ideas from each other (in fact the jeprof executable started
> > >> as a copy of pprof and there are still references the pprof
> > >> documentation in some
> > >> comments)
> > >>
> > >> From what I've seen, the main difference is that the profiling
> > >> seems better-suited to multi-threaded programs, in particular the
> > >> profile file format includes per-thread memory statistics and the
> > >> profiling features can be turned on and off individually per
> > >> thread. From an API perspective, all settings can be accessed by
> > >> the mallctl() function, while it seems that tcmalloc requires some
> > >> options to be set by environment variable (
> > >> https://na01.safelinks.protection.outlook.com/?url=
> > >> https%3A%2F%2Fgperftools.github.io%2Fgperftools%
> > >> 2Fheapprofile.html&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > >> 7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf86f141af91ab2d7cd011
> > >> db47%7C1%7C0%7C636389937852256730&sdata=IQeb2%
> > >> 2BpcrWRQ8yvdTgOEHfyplgC36dy73nnXswdPamo%3D&reserved=0). Finally, I
> > >> also found the documentation to be more thorough.
> > >>
> > >> But again, the two are very similar, so I think the main decision
> > >> here isn't whether to choose jemalloc or tcmalloc but whether to
> > >> switch to a custom memory allocator that has support for profiling
> > >> heap
> > memory usage.
> > >>
> > >>
> > >> On Mon, Aug 21, 2017 at 4:26 PM, Alexander Rojas
> > >> <alexander@mesosphere.io>
> > >> wrote:
> > >>
> > >>> Hi Benno,
> > >>>
> > >>> This does sound like a great addition to Mesos. Can you however
> > >>> explain how jemalloc is better than tcmalloc? I think that for
> > >>> such important change, we probably need some more information.
> > >>>
> > >>> Your comment in MESOS-7876 mentions that we already have tcmalloc
> > >>> since it is part of gperftools, so I would like to have a whole
> > >>> picture of the advantages and disadvantages of both options.
> > >>>
> > >>> Alexander Rojas
> > >>> alexander@mesosphere.io
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>> On 18. Aug 2017, at 12:49, Benno Evers <bevers@mesosphere.com>
> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I would like to propose bundling jemalloc as a new dependency
> > >>>> under `3rdparty/`, and to link Mesos against this new memory
> > >>>> allocator by default.
> > >>>>
> > >>>>
> > >>>> # Motivation
> > >>>>
> > >>>> The Mesos master and agent binaries are, ideally, very
> > >>>> long-running processes. This makes them susceptible to memory
> > >>>> issues, because even small leaks have a chance to build up over
> > >>>> time to the point where they become problematic.
> > >>>>
> > >>>> We have seen several such issues on our internal Mesos
> > >>>> installations, for example
> > >>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > >>>> is
> > >>>> s
> > >>>> ues.apache.org%2Fjira%2Fbrowse%2FMESOS-7748&data=02%7C01%7CJeff.C
> > >>>> of
> > >>>> f
> > >>>> ler%40microsoft.com%7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf
> > >>>> 86
> > >>>> f
> > >>>> 141af91ab2d7cd011db47%7C1%7C0%7C636389937852266742&sdata=L016YGyE
> > >>>> kK
> > >>>> 5
> > >>>> 0WtvhgSNS%2FT5ntkkd9qINorRI2Utp5lk%3D&reserved=0
> > >>>> or https://na01.safelinks.protection.outlook.com/?url=
> > >> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FMESOS-
> > >> 7800&data=02%7C01%7CJeff.Coffler%40microsoft.com%
> > >> 7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf86f141af91ab2d7cd011
> > >> db47%7C1%7C0%7C636389937852266742&sdata=IrzDO6o1VL9a8eGJIW3jKbWXk6U
> > >> 4f
> > >> H
> > >> Fn3Xbn4po1r3c%3D&reserved=0.
> > >>>>
> > >>>> I imagine any organization running Mesos for an extended period
> > >>>> of time has had its share of similar issues, so I expect this
> > >>>> proposal to be useful for the whole community.
> > >>>>
> > >>>>
> > >>>> # Why jemalloc?
> > >>>>
> > >>>> Given that memory issues tend to be most visible after a given
> > >>>> process has been running for a long time, it would be great to
> > >>>> have the option to enable heap tracking and profiling at runtime,
> > >>>> without having to restart the process. (This ability could then
> > >>>> be connected to a Mesos endpoint, similar to how we can adjust
> > >>>> the log level at
> > >>>> runtime)
> > >>>>
> > >>>> The two production-quality memory allocators that have this
> > >>>> ability currently seem to be tcmalloc and jemalloc. Of these,
> > >>>> jemalloc does produce in our experience better and more detailed
> statistics.
> > >>>>
> > >>>>
> > >>>> # What is the impact on users who do not need this feature?
> > >>>>
> > >>>> Naturally, not every single user of Mesos will have a need for
> > >>>> this feature. To ensure these users would not experience serious
> > >>>> performance regressions as a result of this change, we conducted
> > >>>> a preliminary set of benchmarks whose results are collected under
> > >>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > >>>> is
> > >>>> s
> > >>>> ues.apache.org%2Fjira%2Fbrowse%2FMESOS-7876&data=02%7C01%7CJeff.C
> > >>>> of
> > >>>> f
> > >>>> ler%40microsoft.com%7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf
> > >>>> 86
> > >>>> f
> > >>>> 141af91ab2d7cd011db47%7C1%7C0%7C636389937852266742&sdata=RsZcAGuF
> > >>>> m%
> > >>>> 2
> > >>>> Bw2PPLgMql%2B9vVgkFQrZZFJYdPGcBODsCU%3D&reserved=0
> > >>>>
> > >>>> It turns out that we could probably even expect a small speedup
> > >>>> (1%
> > >>>> - 5%) as a nice side-effect of this change.
> > >>>>
> > >>>> Users who compile Mesos themselves would of course have the
> > >>>> option to disable jemalloc at configuration time or replace it
> > >>>> with their memory allocator of choice.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I'm looking forward to hear any thoughts and comments.
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>> --
> > >>>> Benno Evers
> > >>>> Software Engineer, Mesosphere
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Benno Evers
> > >> Software Engineer, Mesosphere
> > >>
> > >
> > >
> > >
> > > --
> > > Benno Evers
> > > Software Engineer, Mesosphere
> >
> >
>
>
> --
> Benno Evers
> Software Engineer, Mesosphere
>



-- 
Benno Evers
Software Engineer, Mesosphere

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message