Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Received-SPF: pass (athena.apache.org: domain of
 daniel.lescohier@cbsinteractive.com designates 74.125.149.77 as permitted
 sender)
MIME-Version: 1.0
Sender: daniel.lescohier@cbsinteractive.com
In-Reply-To: 
 <CAKQ1sVOH+OZ2fujVcROT37EEFEO4ftuAExS_zBXws7LFdH=gzA@mail.gmail.com>
References: 
 <CAFE37cgkiQOLccBH3ReUXg_rpuXbsSDebiaR=ecLhzQnpAODXw@mail.gmail.com>
	<5082195D-7095-4CF8-8D04-D249160EE804@jaguNET.com>
	<CAFE37cjCze-qhPZxEp7=MkTcG4=_F6CQXm3j12wULkHe8m+9cw@mail.gmail.com>
	<CAFE37ci+P6aZyrBxy9ePz=etMOJH5HKs7BHOuiJUXMPzDGptrw@mail.gmail.com>
	<CAKQ1sVOH+OZ2fujVcROT37EEFEO4ftuAExS_zBXws7LFdH=gzA@mail.gmail.com>
Date: Tue, 3 Dec 2013 19:14:22 -0500
Message-ID: 
 <CAFE37cgEAE7Qrf-GTqWNf74f6J5q2gKgesouDaYDSv_3J58z3w@mail.gmail.com>
Subject: Re: time caching in util_time.c and mod_log_config.c
From: Daniel Lescohier <daniel.lescohier@cbsi.com>
To: dev@httpd.apache.org
Content-Type: multipart/alternative; boundary=001a11c2ae7680da3a04ecaa4bf1

--001a11c2ae7680da3a04ecaa4bf1
Content-Type: text/plain; charset=ISO-8859-1

I took a look at apr's configure.in, and it's default for all architectures
except for i486, i586, and i686 is to use the real atomic ops, but for
those three architectures the default is to use the "generic" atomic ops.
Any idea why there is a special rule for those three architectures?
There's nothing wrong with the atomic operations on those three
architectures: otherwise, how have we had semaphores and mutexes for all
these years on those CPUs?  I guess that is a question for the APR dev
mailing list.

I see that some distros override that default.  E.g., the libapr1.spec for
openSUSE has:

%ifarch %ix86
        --enable-nonportable-atomics=yes \
%endif

and in /usr/lib/rpm/macros:


On Tue, Dec 3, 2013 at 12:54 PM, Yann Ylavic <ylavic.dev@gmail.com> wrote:

> I personnally like this solution better (IMHO) since it does not rely on
> apr_thread_mutex_trylock() to be wait-free/userspace (eg. natively
> implements the "compare and swap").
>
> On the other hand, apr_atomic_cas32() may itself be implemented using
> apr_thread_mutex_lock() when USE_ATOMICS_GENERIC is defined (explicitly, or
> with --enable-nonportable-atomics=no, or else forcibly with "gcc -stdc=c89"
> or intel cpus <= i686).
>
> Hence with USE_ATOMICS_GENERIC, apr_thread_mutex_trylock() may be a better
> solution than the apr_thread_mutex_lock()...
>
>
>
> On Tue, Dec 3, 2013 at 6:01 PM, Daniel Lescohier <
> daniel.lescohier@cbsi.com> wrote:
>
>> If the developers list is OK using apr_atomic in the server core, there
>> would be lots of advantages over trylock:
>>
>>    1. No need for child init.
>>    2. No need for function pointers.
>>    3. Could have a lock per cache element (I deemed it too expensive
>>    memory-wise to have a large mutex structure per cache element).
>>    4. It would avoid the problem of trylock not being implemented on all
>>    platforms.
>>    5. Fewer parameters to the function macro.
>>
>> The code would be like this:
>>
>> #define TIME_CACHE_FUNCTION(VALUE_SIZE, CACHE_T, CACHE_PTR,
>> CACHE_SIZE_POWER,\
>>     CALC_FUNC, AFTER_READ_WORK\
>>
>> )\
>>     const apr_int64_t seconds = apr_time_sec(t);\
>>     apr_status_t status;\
>>     CACHE_T * const cache_element = \
>>         &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\
>>     /* seconds==0 can be confused with unitialized cache; don't use cache
>> */\
>>     if (seconds==0) return CALC_FUNC(value, t);\
>>     if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
>>
>>         if (seconds == cache_element->key) {\
>>             memcpy(value, &cache_element->value, VALUE_SIZE);\
>>             apr_atomic_dec32(&cache_element->lock);\
>>
>>             AFTER_READ_WORK;\
>>             return APR_SUCCESS;\
>>         }\
>>         if (seconds < cache_element->key) {\
>>             apr_atomic_dec32(&cache_element->lock);\
>>             return CALC_FUNC(value, t);\
>>         }\
>>         apr_atomic_dec32(&cache_element->lock);\
>>
>>     }\
>>     status = CALC_FUNC(value, t);\
>>     if (status == APR_SUCCESS) {\
>>         if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
>>
>>             if (seconds > cache_element->key) {\
>>                 cache_element->key = seconds;\
>>                 memcpy(&cache_element->value, value, VALUE_SIZE);\
>>             }\
>>             apr_atomic_dec32(&cache_element->lock);\
>>         }\
>>     }\
>>     return status;
>>
>> --------------------------------------------------
>>
>> typedef struct {
>>     apr_int64_t key;
>>     apr_uint32_t lock;
>>
>>     apr_time_exp_t value;
>> } explode_time_cache_t;
>>
>> TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,
>>            TIME_CACHE_SIZE_POWER)
>>
>> AP_DECLARE(apr_status_t) ap_explode_recent_localtime(
>>     apr_time_exp_t * value, apr_time_t t)
>> {
>>     TIME_CACHE_FUNCTION(
>>
>>     sizeof(apr_time_exp_t), explode_time_cache_t, explode_time_lt_cache,
>>     TIME_CACHE_SIZE_POWER, apr_time_exp_lt,
>>     value->tm_usec = (apr_int32_t) apr_time_usec(t))
>> }
>>
>

--001a11c2ae7680da3a04ecaa4bf1
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra">I took a look at apr&#39;s <a h=
ref=3D"http://configure.in">configure.in</a>, and it&#39;s default for all =
architectures except for i486, i586, and i686 is to use the real atomic ops=
, but for those three architectures the default is to use the &quot;generic=
&quot; atomic ops.=A0 Any idea why there is a special rule for those three =
architectures?=A0 There&#39;s nothing wrong with the atomic operations on t=
hose three architectures: otherwise, how have we had semaphores and mutexes=
 for all these years on those CPUs?=A0 I guess that is a question for the A=
PR dev mailing list.<br>
<br></div><div class=3D"gmail_extra">I see that some distros override that =
default.=A0 E.g., the libapr1.spec for openSUSE has:<br><br>%ifarch %ix86<b=
r>=A0=A0=A0=A0=A0=A0=A0 --enable-nonportable-atomics=3Dyes \<br>%endif<br><=
br></div><div class=3D"gmail_extra">
and in /usr/lib/rpm/macros:<br><br></div><div class=3D"gmail_extra"><br><di=
v class=3D"gmail_quote">On Tue, Dec 3, 2013 at 12:54 PM, Yann Ylavic <span =
dir=3D"ltr">&lt;<a href=3D"mailto:ylavic.dev@gmail.com" target=3D"_blank">y=
lavic.dev@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div sty=
le=3D"font-family:arial,helvetica,sans-serif">I personnally like this solut=
ion better (IMHO) since it does not rely on apr_thread_mutex_trylock() to b=
e wait-free/userspace (eg. natively implements the &quot;compare and swap&q=
uot;).<br>

<br></div><div style=3D"font-family:arial,helvetica,sans-serif">On the othe=
r hand, apr_atomic_cas32() may itself be implemented using apr_thread_mutex=
_lock() when USE_ATOMICS_GENERIC is defined (explicitly, or with --enable-n=
onportable-atomics=3Dno, or else forcibly with &quot;gcc -stdc=3Dc89&quot; =
or intel cpus &lt;=3D i686).<br>

<br></div><div style=3D"font-family:arial,helvetica,sans-serif">Hence with =
USE_ATOMICS_GENERIC, apr_thread_mutex_trylock() may be a better solution th=
an the apr_thread_mutex_lock()...</div><div style=3D"font-family:arial,helv=
etica,sans-serif">

<br></div></div><div class=3D""><div class=3D"h5"><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Tue, Dec 3, 2013 at 6:01 PM, Daniel=
 Lescohier <span dir=3D"ltr">&lt;<a href=3D"mailto:daniel.lescohier@cbsi.co=
m" target=3D"_blank">daniel.lescohier@cbsi.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">If the d=
evelopers list is OK using apr_atomic in the server core, there would be lo=
ts of advantages over trylock:<br>

<ol><li>No need for child init.</li><li>No need for function pointers.</li>=
<li>Could have a lock per cache element (I deemed it too expensive memory-w=
ise to have a large mutex structure per cache element).</li>
<li>It would avoid the problem of trylock not being implemented on all plat=
forms.</li><li>Fewer parameters to the function macro.<br></li></ol><p>The =
code would be like this:</p><p>#define TIME_CACHE_FUNCTION(VALUE_SIZE, CACH=
E_T, CACHE_PTR, CACHE_SIZE_POWER,\<br>


=A0=A0=A0 CALC_FUNC, AFTER_READ_WORK\</p><div><br>)\<br>=A0=A0=A0 const apr=
_int64_t seconds =3D apr_time_sec(t);\<br>=A0=A0=A0 apr_status_t status;\<b=
r>=A0=A0=A0 CACHE_T * const cache_element =3D \<br>=A0=A0=A0=A0=A0=A0=A0 &a=
mp;(CACHE_PTR[seconds &amp; ((1&lt;&lt;CACHE_SIZE_POWER)-1)]);\<br>


=A0=A0=A0 /* seconds=3D=3D0 can be confused with unitialized cache; don&#39=
;t use cache */\<br>=A0=A0=A0 if (seconds=3D=3D0) return CALC_FUNC(value, t=
);\<br></div>=A0=A0=A0 if (apr_atomic_cas32(&amp;cache_element-&gt;lock, 1,=
 0)=3D=3D0) {\<div>
<br>=A0=A0=A0=A0=A0=A0=A0 if (seconds =3D=3D cache_element-&gt;key) {\<br>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 memcpy(value, &amp;cache_element-&gt;valu=
e, VALUE_SIZE);\<br></div>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 apr_atomic_dec3=
2(&amp;cache_element-&gt;lock);\<div><br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =
AFTER_READ_WORK;\<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 return APR_SUCCESS;\=
<br>
=A0=A0=A0=A0=A0=A0=A0 }\<br>=A0=A0=A0=A0=A0=A0=A0 if (seconds &lt; cache_el=
ement-&gt;key) {\<br></div>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 apr_atomic_dec32(&amp;cache_element-&gt;l=
ock);\<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 return CALC_FUNC(value, t);\<br=
>=A0=A0=A0=A0=A0=A0=A0 }\<br>=A0=A0=A0=A0=A0=A0=A0 apr_atomic_dec32(&amp;ca=
che_element-&gt;lock);\<div><br>=A0=A0=A0 }\<br>=A0=A0=A0 status =3D CALC_F=
UNC(value, t);\<br>


=A0=A0=A0 if (status =3D=3D APR_SUCCESS) {\<br></div>=A0=A0=A0=A0=A0=A0=A0 =
if (apr_atomic_cas32(&amp;cache_element-&gt;lock, 1, 0)=3D=3D0) {\<div><br>=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 if (seconds &gt; cache_element-&gt;key) {=
\<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 cache_element-&gt;key =
=3D seconds;\<br>


=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 memcpy(&amp;cache_element-&gt=
;value, value, VALUE_SIZE);\<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }\<br></d=
iv>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 apr_atomic_dec32(&amp;cache_element-&g=
t;lock);\<br>=A0=A0=A0=A0=A0=A0=A0 }\<br>=A0=A0=A0 }\<br>=A0=A0=A0 return s=
tatus;<br><p></p><p>

</p><p>--------------------------------------------------<br>
</p><p>typedef struct {<br>=A0=A0=A0 apr_int64_t key;<br>=A0=A0=A0 apr_uint=
32_t lock;</p><div><br>=A0=A0=A0 apr_time_exp_t value;<br>} explode_time_ca=
che_t;<br><br>TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,<br>=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 TIME_CACHE_SIZE_POWER)<br>


<br></div>AP_DECLARE(apr_status_t) ap_explode_recent_localtime(<br>=A0=A0=
=A0 apr_time_exp_t * value, apr_time_t t)<br>{<br>=A0=A0=A0 TIME_CACHE_FUNC=
TION(<div><br>=A0=A0=A0 sizeof(apr_time_exp_t), explode_time_cache_t, explo=
de_time_lt_cache, <br>

=A0=A0=A0 TIME_CACHE_SIZE_POWER, apr_time_exp_lt, <br></div><div>
=A0=A0=A0 value-&gt;tm_usec =3D (apr_int32_t) apr_time_usec(t))<br>}<br></d=
iv></div></blockquote></div></div></div></div></blockquote></div><br></div>=
</div>

--001a11c2ae7680da3a04ecaa4bf1--