Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@stdcxx.apache.org
Received-SPF: pass (athena.apache.org: domain of stefan.teleman@gmail.com
 designates 209.85.212.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <505A6877.7090507@hates.ms>
References: 
 <CALdE9OD1q8y6yo7FhZLwdSA0w5R1Jy0gTFY39GjGbgT+cHYhOQ@mail.gmail.com>
	<505A6877.7090507@hates.ms>
Date: Wed, 19 Sep 2012 22:02:59 -0400
Message-ID: 
 <CALdE9ODV=msOJgWiOVRcTDe1F=sfFQpnMXaZpoexG5QP3mg5+Q@mail.gmail.com>
Subject: Re: STDCXX-1056 : numpunct fix
From: Stefan Teleman <stefan.teleman@gmail.com>
To: dev@stdcxx.apache.org
Content-Type: text/plain; charset=UTF-8

On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara <nikkoara@hates.ms> wrote:

> I think you are referring to `live' cache objects and the code which
> specifically adjusts the size of the buffer according to the number of
> `live' locales and/or facets in it. In that respect I would not call that
> eviction because locales and facets with non-zero reference counters are
> never evicted.
>
> But anyhoo, this is semantics. Bottom line is the locale/facet buffer
> management code follows a principle of economy.

Yes it does. But we have to choose between economy and efficiency. To
clarify: The overhead of having unused pointers in the cache is
sizeof(void*) times the number of unused "slots".  This is 2012. Even
an entry-level Android cell phone comes with 1GB system memory. If we
want to talk about embedded systems, where memory constraints are more
stringent than cell phones, then we're not talking about Apache stdcxx
anymore, or any other open souce of the C++ Standard Library. These
types of systems use C++ for embedded systems, which is a different
animal altogether: no exceptions support, no rtti. For example see,
Green Hills: http://www.ghs.com/ec++.html. And even they have become
more relaxed about memory constraints. They use BOOST.

Bottom line: so what if 16 pointers in this 32 pointer slots cache
never get used. The maximum amount of "wasted memory" for these 16
pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
Can we live with that in 2012, a year when a $500 laptop comes with
4GB RAM out of the box? I would pick 128 bytes of allocated but unused
memory over random and entirely avoidable memory churn any day.

> The optimal number is subject to debate. Probably Martin can give an insight
> into the reasons for that number. Why did you pick 32 (or is it 64 in your
> patch) and not any other? Is it something based on your experience as a user
> or programmer?

Based on two things:

1. There are, apparently, 30 "top" languages spoken on this planet:

http://www.vistawide.com/languages/top_30_languages.htm

2. I've written locale-aware software back in my days on Wall Street.
The maximum number of locales I had to support was 14.

So max(14, 30) would be 30. So I made it 32 because it's a power of 2.

> A negligible overhead, IMO. The benefits of maintaining a small memory
> footprint may be important for some environments. As useful as principles
> may be, see above.

Small and negligible in theory. In practice, when the cache starts
resizing itself by allocating new memory, copying, delete[]'ing and -
I forgot to mention this in my initial post - finishing it all up with
a call to qsort(3C), it's not that negligible anymore. It doesn't just
happen once. It happens every time the cache gets "anxious" (for
reasons mentioned in my previous email) and wants to resize itself.
Which triggers the following question in my mind: why are we even
causing all this memory churn in the first place? Because we saved 128
bytes (or 64 bytes on a 32-bit machine, which is what most cell
phones/tablets are these days)?

My goal: I would be very happy if any application using Apache stdcxx
would reach its peak instantiation level of localization (read: max
number of locales and facets instantiated and cached, for the
application's particular use case), and would then stabilize at that
level *without* having to resize and re-sort the cache, *ever*. That
is a locale cache I can love. I love binary searches on sorted
containers. Wrecking the container with insertions or deletions, and
then having to re-sort it again, not so much. Especially when I can't
figure out why we're doing it in the first place.

> In this respect you could call every memory allocation and de-allocation is
> an overhead. Please keep in mind that this resembles the operations
> performed for any sequence containers; how likely is it for a program to
> have more locale/facet creation/destruction than strings or vectors
> mutations?

There's one fundamental difference: the non-sorted STL containers give
the developer the opportunity to construct them with an initial size
larger than the implementation-specific default size. Any application
developer worth their salt would perform some initial size
optimization for these types of containers. If I know that my
std::list will end up containing 5000 "things", I would never
construct my list object with the default size of 16. Do that, and
you'll get flamed at code review. As for the sorted associative
containers, that's one of the major gripes against them: whenever they
have to grow, or rebalance, they get expensive. But we're not using a
sorted associative container here. It's just a plain ol' C array.

> Could you please elaborate a bit on this? Is this your opinion based on your
> user and/or programmer experience?

See above about the top 30 languages spoken in the world.

> Hey Stefan, are the above also timing the changes?

Nah, I didn't bother with the timings - yet - for a very simple
reason: in order to use instrumentation, both with SunPro and with
Intel compilers, optimization of any kind must be disabled. On SunPro
you have to pass -xkeepframe=%all (which disables tail-call
optimization as well), in addition to passing -xO0 and -g. So the
timings for these unoptimized experiments would have been completely
irrelevant.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.teleman@gmail.com