apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Marantz <jmara...@google.com>
Subject Re: apr_memcache operation timeouts
Date Fri, 22 Mar 2013 20:04:22 GMT
Thanks very much for doing this, Jeff.

Unfortunately scanning your diff I think you wound up taking a snapshot of
a patch I'd uploaded to the bug, but not updated after several subsequent
bug fixes.

The golden copies are in
https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil:

https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h

After we released our module we had a few small tweaks and bug-fixes based
on external feedback for our module.  Those versions have now been stable
for 3.5 months, with multiple users reporting success with our memcached
integration, so I'm hopeful that they have settled into a stable state.

What's the best way to go from here?

-Josh


On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <trawick@gmail.com> wrote:

> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <trawick@gmail.com> wrote:
> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jmarantz@google.com>
> wrote:
> >>
> >> ping!
> >>
> >> Please don't hesitate to push back and tell me if I can supply the patch
> >> or
> >> update in some easier-to-digest form.  In particular, while I have
> >> rigorously stress-tested this change using mod_pagespeed's unit test,
> >> system-test, and load-test framework, I don't really understand what the
> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
> >> someone points me to a change-list or patch-file that does it properly.
> >>
> >> -Josh
> >
> >
> > I'll try hard to work on this in the next couple of days.  It would be
> great
> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
> week.
>
> Attached is a first pass at getting your patch to apr trunk.  Can you
> confirm that I didn't break your tests?  (It passes the apr tests, but
> that may not mean much.)
>
>
>
> >
> >>
> >>
> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmarantz@google.com>
> >> wrote:
> >>
> >> > I have completed a solution to this problem, which can be a drop-in
> >> > update
> >> > for the existing apr_memcache.c.  It is now checked in for my module
> as
> >> >
> >> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >> > .
> >> >
> >> > It differs from the solution in
> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >> >
> >> >    - It doesn't require an API change; it but it enforces the 50ms
> >> >    timeout that already exists for apr_multgetp for all operations.
> >> >    - It works under my load test (which I found is not true of the
> patch
> >> >    in 51065).
> >> >
> >> > For my own purposes, I will be shipping my module with apr_memcache2
> so
> >> > I
> >> > get the behavior I want regardless of what version of Apache is
> >> > installed.
> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> >> > attached, and I've also submitted it as an alternative patch to bug
> >> > 51065.
> >> >
> >> > If you agree with the strategy I used to solve this problem, then
> please
> >> > let me know if I can help with any changes required to get this into
> the
> >> > main distribution,
> >> >
> >> >
> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
> >> > <jmarantz@google.com>wrote:
> >> >
> >> >> I've had some preliminary success with my own variant of
> apr_memcache.c
> >> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> >> timeout, I've been mimicing the timeout strategy I saw in
> >> >> apr_memcache_multgetp, by adding a new helper method:
> >> >>
> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >> >>                                                apr_memcache2_conn_t*
> >> >> conn) {
> >> >>     apr_pollset_t* pollset;
> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool,
0);
> >> >>     if (rv == APR_SUCCESS) {
> >> >>         apr_pollfd_t pollfd;
> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >> >>         pollfd.reqevents = APR_POLLIN;
> >> >>         pollfd.p = temp_pool;
> >> >>         pollfd.desc.s = conn->sock;
> >> >>         pollfd.client_data = NULL;
> >> >>         apr_pollset_add(pollset, &pollfd);
> >> >>         apr_int32_t queries_recvd;
> >> >>         const apr_pollfd_t* activefds;
> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
> >> >> &queries_recvd,
> >> >>                               &activefds);
> >> >>         if (rv == APR_SUCCESS) {
> >> >>             assert(queries_recvd == 1);
> >> >>             assert(activefds->desc.s == conn->sock);
> >> >>             assert(activefds->client_data == NULL);
> >> >>         }
> >> >>     }
> >> >>     return rv;
> >> >> }
> >> >>
> >> >> And calling that before many of the existing calls to get_server_line
> >> >> as:
> >> >>
> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >> >>     if (rv != APR_SUCCESS) {
> >> >>         ms_release_conn(ms, conn);
> >> >>         return rv;
> >> >>     }
> >> >>
> >> >> This is just an experiment; I think I can streamline this by
> >> >> pre-populating the pollfd structure as part of the
> apr_memcache_conn_t
> >> >> (actually now apr_memcache2_conn_t).
> >> >>
> >> >> I have two questions about this:
> >> >>
> >> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
> >> >> is
> >> >> somewhat different from the one that ships with Apache 2.2.  In
> >> >> particular
> >> >> the 2.4 version cannot be compiled against the headers that come
> with a
> >> >> 2.2
> >> >> distribution.  Is there any downside to taking my hacked 2.2
> >> >> apr_memcache.c
> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >> >>
> >> >> 2. This seems wasteful in terms of system calls.  I am making an
> extra
> >> >> call to poll, rather than relying on the socket timeout.  The socket
> >> >> timeout didn't work as well as this though.  Does anyone have any
> >> >> theories
> >> >> as to why, or what could be done to the patch in
> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >> >>
> >> >> -Josh
> >> >>
> >> >>
> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
> >> >> <jmarantz@google.com>wrote:
> >> >>
> >> >>> Following up: I tried doing what I suggested above: patching that
> >> >>> change
> >> >>> into my own copy of apr_memcache.c  It was first of all a bad idea
> to
> >> >>> pull
> >> >>> in only part of apr_memcache.c because that file changed slightly
> >> >>> between
> >> >>> 2.2 and 2.4 and our module works in both.
> >> >>>
> >> >>> I was successful making my own version of apr_memcache (renaming
> >> >>> entry-points apr_memcache2*) that I could hack.  But if I changed
> the
> >> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> >> >>> poorly
> >> >>> under load test (though it worked fine in our unit-tests and
> >> >>> system-tests).
> >> >>>  In other words, I think the proposed patch that Jeff pointed to
> above
> >> >>> is
> >> >>> not really working (as he predicted).  This test was done without
> >> >>> SIGSTOPing the memcached; it would timeout under our load anyway
and
> >> >>> thereafter behave poorly.
> >> >>>
> >> >>> I'm going to follow up on that bugzilla entry, but for now I'm
going
> >> >>> to
> >> >>> pursue my own complicated mechanism of timing out the calls from
my
> >> >>> side.
> >> >>>
> >> >>> -Josh
> >> >>>
> >> >>>
> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
> >> >>> <jmarantz@google.com>wrote:
> >> >>>
> >> >>>> Thanks Jeff, that is very helpful.  We are considering a course
of
> >> >>>> action and before doing any work toward this, I'd like to
> understand
> >> >>>> the
> >> >>>> pitfalls from people that understand Apache better than us.
> >> >>>>
> >> >>>> Here's our reality: we believe we need to incorporate memcached
for
> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively
for
> very
> >> >>>> large sites & hosting providers.  We are fairly close (we
think) to
> >> >>>> releasing this functionality as beta.  However, in such large
> sites,
> >> >>>> stuff
> >> >>>> goes wrong: machines crash, power failure, fiber cut, etc.
 When it
> >> >>>> does we
> >> >>>> want to fall back to serving partially unoptimized sites rather
> than
> >> >>>> hanging the servers.
> >> >>>>
> >> >>>> I understand the realities of backward-compatible APIs.  My
> >> >>>> expectation
> >> >>>> is that this would take years to make it into an APR distribution
> we
> >> >>>> could
> >> >>>> depend on.  We want to deploy this functionality in weeks.
 The
> >> >>>> workarounds
> >> >>>> we have tried backgrounding the apr_memcache calls in a thread
and
> >> >>>> timing
> >> >>>> out in mainline are complex and even once they work 100% will
be
> very
> >> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly
on
> >> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >> >>>>
> >> >>>> Our plan is to copy the patched implementation of
> >> >>>> apr_memcache_server_connect and the static methods it calls
into a
> >> >>>> new .c
> >> >>>> file we will link into our module, naming the new entry-point
> >> >>>> something
> >> >>>> else (apr_memcache_server_connect_with_timeout seems good).
 From a
> >> >>>> CS/SE perspective this is offensive and we admit it, but from
a
> >> >>>> product
> >> >>>> quality perspective we believe this beats freezes and
> >> >>>> complicated/imperfect
> >> >>>> workarounds with threads.
> >> >>>>
> >> >>>> So I have two questions for the Apache community:
> >> >>>>
> >> >>>>    1. What are the practical problems with this approach? 
Note
> that
> >> >>>>    in any case a new APR rev would require editing/ifdefing
our
> code
> >> >>>> anyway,
> >> >>>>    so I think immunity from APR updates such as this patch
being
> >> >>>> applied is
> >> >>>>    not a distinguishing drawback.
> >> >>>>    2. Is there an example of the correct solution to the technical
> >> >>>>    problem Jeff highlighted: "it is apparently missing a call
to
> >> >>>>    adjust the socket timeout and to discard the connection
if the
> >> >>>>    timeout is reached".  That sounds like a pattern that might
be
> >> >>>>    found elsewhere in the Apache HTTPD code base.
> >> >>>>
> >> >>>> Thanks in advance for your help!
> >> >>>> -Josh
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
> >> >>>> <trawick@gmail.com>wrote:
> >> >>>>
> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
> >> >>>>> <jmarantz@google.com>
> >> >>>>> wrote:
> >> >>>>> > Is there a mechanism to time out individual operations?
> >> >>>>>
> >> >>>>> No, the socket connect timeout is hard-coded at 1 second
and the
> >> >>>>> socket I/O timeout is disabled.
> >> >>>>>
> >> >>>>> Bugzilla bug
> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >> >>>>> has a patch, though it is apparently missing a call to
adjust the
> >> >>>>> socket timeout and to discard the connection if the timeout
is
> >> >>>>> reached.  More importantly, the API can only be changed
in future
> >> >>>>> APR
> >> >>>>> 2.0; alternate, backwards-compatible API(s) could be added
in
> future
> >> >>>>> APR-Util 1.6.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > If memcached freezes, then it appears my calls to
'get' will
> block
> >> >>>>> until
> >> >>>>> > memcached wakes up.  Is there any way to set a timeout
for that
> >> >>>>> > call?
> >> >>>>> >
> >> >>>>> > I can repro this in my unit tests by sending a SIGSTOP
to
> >> >>>>> > memcached
> >> >>>>> before
> >> >>>>> > doing a 'get'?
> >> >>>>> >
> >> >>>>> > Here are my observations:
> >> >>>>> >
> >> >>>>> > apr_memcache_multgetp seems to time out in bounded
time if I
> >> >>>>> > SIGSTOP
> >> >>>>> the
> >> >>>>> > memcached process. Yes!
> >> >>>>> >
> >> >>>>> > apr_memcache_getp seems to hang indefinitely if I
SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP
the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_delete seems to hang indefinitely if
I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_stats seems to hang indefinitely if I
SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > That last one really sucks as I am using that to print
the
> status
> >> >>>>> > of
> >> >>>>> all my
> >> >>>>> > cache shards to the log file if I detected a problem
:(
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > Why does apr_memcache_multgetp do what I want and
not the
> others?
> >> >>>>>  Can I
> >> >>>>> > induce the others to have reasonable timeout behavior?
> >> >>>>> >
> >> >>>>> > When I SIGSTOP memcached this makes it hard to even
restart
> >> >>>>> > Apache,
> >> >>>>> at
> >> >>>>> > least with graceful-stop.
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > On a related note, the apr_memcache
> >> >>>>> > documentation<
> >> >>>>>
> >> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >> >>>>> >is
> >> >>>>> > very thin.  I'd be happy to augment it with my observations
on
> its
> >> >>>>> > usage
> >> >>>>> > and the meaning of some of the arguments if that was
desired.
>  How
> >> >>>>> would I
> >> >>>>> > go about that?
> >> >>>>>
> >> >>>>> Check out APR trunk from Subversion, adjust the doxygen
docs in
> the
> >> >>>>> include files, build (make dox) and inspect the results,
submit a
> >> >>>>> patch to dev@apr.apache.org.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > -Josh
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Born in Roswell... married an alien...
> >> >>>>> http://emptyhammock.com/
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
> >
> >
> > --
> > Born in Roswell... married an alien...
> > http://emptyhammock.com/
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Mime
View raw message