mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Clemmer <clemmer.alexan...@gmail.com>
Subject Re: Proposal for Mesos Build Improvements
Date Tue, 14 Feb 2017 19:39:33 GMT
Just to add a bit of context, the history of the issue of build time is
tracked in MESOS-1582[1], and most recently[2].

Speaking personally, I'm excited about _any_ progress in this area,
because (1) the Windows build times are completely unbearable, and (2)
because getting the build times down benefits the whole community.

When it was basically just me working on the Windows code paths, this
issue was tolerable, but now that we have multiple people working
full-time, it is really important to start fixing the issue.

[1] https://issues.apache.org/jira/browse/MESOS-1582
[2]
https://issues.apache.org/jira/browse/MESOS-1582?focusedCommentId=15828645&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15828645


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Tue, 14 Feb 2017, Jeff Coffler wrote:

> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure improvements. These
improvements will improve speed and ease of work in particular components, and dramatically
improve overall build time, especially in the Windows environment, but likely in the Linux
environment as well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build process. This
makes the Linux build process more tolerable in terms of speed, but unfortunately such software
is not available on Windows. Ultimately, though, the caching software is covering up two fundamental
flaws in the overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much longer, particularly
when a lot of work is being done in a particular component. If work is being done in a particular
component, only that library need be rebuilt (and then the overall image relinked). Currently,
since there is no such modularization, all source files must be considered at build time.
Interestingly enough, there is such modularization in the source code layout; that modularization
just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html.
Straight from the GNU CC documentation: "The time the compiler takes to process these header
files over and over again can account for nearly all of the time required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took about 4 seconds
to compile. After switching to precompiled headers, the precompiled header creation took about
4 seconds, but each C/C++ file now took about 200 milliseconds to compile. The overall build
speed was thus dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, the CMake
system is the easiest way to maintain some level of portability between the Linux and Windows
platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of files,
if functionally separate, to be compiled into an archive (.a) file. These archive files will
then be linked together to form the actual executables. These changes will primarily be in
the CMake system, and should have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library (.so) files.
However, this only makes the most sense if the code is truly shared between different executable
files. If that's not the case, then it likely makes sense just to stick with .a files. Regardless,
generation of .so files is out of scope for this change.
>
>
> Details for Precompiled Header Changes:
>
> Precompiled headers will make use of stout (a very large header-only library) essentially
"free" from a compile-time overhead point of view. Basically, precompiled headers will take
a list of header files (including very long header files, like "windows.h"), and generate
the compiler memory structures for their representation.
>
> During precompiled header generation, these memory structures are flushed to disk. Then,
when components are built, the memory structures are reloaded from disk, which is dramatically
faster than actually parsing the tens of thousands of lines of header files and building the
memory structures.
>
> For precompiled headers to be useful, a relatively "consistent" set of headers must be
included by all of the C/C++ files. So, for example, consider the following C file:
>
> #if defined(windows)
> #include <windows.h>
> #endif
>
> #include <header-a>
> #include <header-b>
> #include <header-c>
>
> < - Remainder of module - >
>
> To make a precompiled header for this module, all of the #include files would be included
in a new file, mesos_common.h. The C file would then be changed as follows:
>
> #include "mesos_common.h"
>
> < - Remainder of module - >
>
> Structurally, the code is identical, and need not be built with precompiled headers.
However, use of precompiled headers will make file compilation dramatically faster.
>
> Note that other include files can be included after the precompiled header if appropriate.
For example, the following is valid:
>
> #include "mesos_common.h"
> #inclue <header-d>
>
> < - Remainder of module - >
>
> For efficiency purposes, if a header file is included by 50% or more of the source files,
it should be included in the precompiled header. If a header is included in fewer than 50%
of the source files, then it can be separately included (and thus would not benefit from precompiled
headers). Note that this is a guideline; even if a header is used by less than 50% of source
files, if it's very large, we still may decide to throw it in the precompiled header.
>
> Note that, for use of precompiled headers, there will be a great deal of code churn (almost
exclusively in the #include list of source files). This will mean that there will be a lot
of code merges, but ultimately no "code logic" changes. If merges are not done in a timely
fashion, this can easily result in needless hand merging of changes. Due to these issues,
we will need a dedicated sheppard that will integrate the patches quickly. This kind of work
is easily invalidated when the include list is changed by another developer, necessitating
us to redo the patch. [Note that Joseph has stepped up to the plate for this, thanks Joseph!]
>
>
> This is the end of my proposal, feedback would be appreciated.
>

Mime
View raw message