corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <pmke...@apache.org>
Subject Re: C99 versus C++ (limited)
Date Thu, 13 Aug 2015 20:31:25 GMT
I think C++ would make working with the DocFormats library, at least in its current form, significantly
easier. In particular, the explicit support for classes, and the ability to use smart pointers
(thus avoiding manual reference counting) would be a big win in terms of complexity.

As a background to why the library is in C and not C++:

The reason is that originally DocFormats was written in Objective C (since I was targeting
only iOS at the time). Objective C is a superset of C, so when I decided I wanted to open
source the code and enable it to be used on non-Apple platforms, I methodically went through
the source tree converting all the Objective C classes and reference counting statements into
their C equivalents. Objective C has automatic reference counting now, but at the time I was
not using it, so this meant the translation was relatively straightforward.

While it *is* possible to mix Objective C and C++, doing so results in an additional layer
of complexity, which I wanted to avoid - you have two ways of defining classes etc. The conversion
to C was simpler than I expect a conversion to C++ would have been. However, now all the code
is in C and completely free of any Apple-specific dependencies, I think it would be reasonable
to move to C++ to more concisely express many of the things that are currently done explicitly
(memory management being the most significant). The resulting code would also be more readable.

I don’t volunteer to do the conversion myself, since it’s a lot of work. However for anyone
willing to take on the task, this would be an excellent way of becoming intimately familiar
with the library, which would be of great use in developing ODF and other filters.

I kind of have a natural aversion to C++ because of it’s complexity, and the sheer number
of features which, if they are all (or even a significant portion of them) used can lead to
very complicated code. I think we should agree on fairly strict guidelines on the subset of
the language we use, do avoid things “getting out of hand” with the codebase, so to speak.

There are some nice properties of C I like, such as the ability to grep for a function name
throughout the whole source tree to find out all the places it’s used, which is handy for
refactoring. Xcode also has some refactoring tools which work for Objective C and most of
C, which I used a lot doing the original conversion, but these do not work with C++ (of course
this is a limitation of Xcode, not a problem with C++ per se).

There are some specific areas we’ll need to be careful about in terms of performance. Actually
the first pure C code I had in DocFormats, long before I converted the whole library, was
the DFNode and DFDocument structures, which use a specialised memory allocator that simply
allocates a slab of memory and frees it all in one go after conversion has finished. Prior
to that, every node was a separate Objective C object, and freeing a whole document took an
inordinate amount of time, due to the large number of release messages sent to free individual
nodes, and the fact that Objective C’s dispatch mechanism is not efficient for compute-intensive
code. This had a very noticeable impact on load times of large documents, which was greatly
improved by switching to a customised, efficient memory allocation strategy. We should maintain
this when moving to C++.

Regarding Flat, I’d like to keep that in C at least for now, because my plan is to build
a virtual machine for executing Flat programs, and for which I’ll implement a garbage collector,
which necessarily requires intimate knowledge of the memory layout of objects. While this
is possible to do in C++, it’s easier in C as there’s less abstractions in the way. Flat
is also about to get it’s own type system, which will be different in many respects from
that of C++ (and more tailored towards the task of transformation). I’ll post more on this
in due course.

But for the bulk of the DocFormats code, I think it makes sense to move to C++, and that we’ll
benefit from the improved maintainability and make it easier for new committers coming into
the project to understand the structure of the code.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 10 Aug 2015, at 6:26 pm, jan i <jani@apache.org> wrote:
> 
> Hi
> 
> Peter and I talked the other day and among others about the benefits of
> using C++ instead of sticking to C99.
> 
> This would be a major change in the project (less in the code, more in the
> "how to"), and it is
> not something we should "just" do.
> 
> I favor C++, but not unlimited, I see 2 places where C++ can give us more
> stable code:
> - Interfaces.
> Using classes to group our functions (like e.g. platform, core, filters/odf
> etc.),
> would make it very clear where the function originates. It would also allow
> group global variables that are private to the rest of the world.
> I would not use real interface classes, for our internal grouping, that is
> not needed. But e.g. the DocFormats API should be a real interface class
> - Automatic.
> At the moment we have a lot of code managing construction/deconstruction,
> that could be totally automated by use of C++ smart pointers.
> - Object model (filters, flat and core)
> would be more logically represented as objects, and suddenly copying etc.
> would be a lot easier.
> 
> I would not like to see big inheritance (especially not multiple
> inheritance).
> 
> I fail to see what we loose by making the change, but please give your
> opinion.
> 
> rgds
> jan i.
> 
> Ps. This is in no way a vote thread, but simply a way to gather opinions.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message