apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Marantz <jmara...@google.com>
Subject Re: apr_memcache operation timeouts
Date Fri, 29 Mar 2013 12:10:21 GMT
I think it makes perfect sense to proceed with the patch you have then, and
we can propose one or two more incremental improvements from that.  It's
good that the patch you provided for me helps me gauge a reasonable size of
delta.

So the ball is in my court; I plan to start with a mod_pagespeed snapshot
from the time I made that patch, unpatch my patch, re-patch your patch &
run the tests I had at the time.

-Josh




On Fri, Mar 29, 2013 at 7:53 AM, Jeff Trawick <trawick@gmail.com> wrote:

> (dropping modules-dev)
>
> On Fri, Mar 22, 2013 at 4:04 PM, Joshua Marantz <jmarantz@google.com>
> wrote:
> > Thanks very much for doing this, Jeff.
> >
> > Unfortunately scanning your diff I think you wound up taking a snapshot
> of
> > a patch I'd uploaded to the bug, but not updated after several subsequent
> > bug fixes.
> >
> > The golden copies are in
> >
> https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil
> :
> >
> >
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h
> >
> > After we released our module we had a few small tweaks and bug-fixes
> based
> > on external feedback for our module.  Those versions have now been stable
> > for 3.5 months, with multiple users reporting success with our memcached
> > integration, so I'm hopeful that they have settled into a stable state.
> >
> > What's the best way to go from here?
>
> The patch I started with, apparently with fewer changes than what
> you're using now, was a bit beyond the number of changes I feel
> comfortable with reviewing/testing/committing in one cycle.
>
> Some possibilities:
>
> * somebody in a Rambo mood decides to commit your current stuff
> largely as-is to trunk (not me; I don't want to commit anything I
> don't understand line by line, and I don't have the time to go through
> even more)
> * you or somebody else split your code into a series of easy to
> understand patches that solve one problem and which can be backported
> back to 1.5.x without breaking anything
> * ???
>
> Before anyone commits, some of the tweaks in my patch need to be
> carried forward, including zapping asserts and the related change to
> the public structure.  (ISTR adding an error check for a pollset
> operation that we have seen fail in httpd-land.)
>
> >
> > -Josh
> >
> >
> > On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <trawick@gmail.com>
> wrote:
> >
> >> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <trawick@gmail.com>
> wrote:
> >> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jmarantz@google.com>
> >> wrote:
> >> >>
> >> >> ping!
> >> >>
> >> >> Please don't hesitate to push back and tell me if I can supply the
> patch
> >> >> or
> >> >> update in some easier-to-digest form.  In particular, while I have
> >> >> rigorously stress-tested this change using mod_pagespeed's unit test,
> >> >> system-test, and load-test framework, I don't really understand what
> the
> >> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
> >> >> someone points me to a change-list or patch-file that does it
> properly.
> >> >>
> >> >> -Josh
> >> >
> >> >
> >> > I'll try hard to work on this in the next couple of days.  It would be
> >> great
> >> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
> >> week.
> >>
> >> Attached is a first pass at getting your patch to apr trunk.  Can you
> >> confirm that I didn't break your tests?  (It passes the apr tests, but
> >> that may not mean much.)
> >>
> >>
> >>
> >> >
> >> >>
> >> >>
> >> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmarantz@google.com>
> >> >> wrote:
> >> >>
> >> >> > I have completed a solution to this problem, which can be a drop-in
> >> >> > update
> >> >> > for the existing apr_memcache.c.  It is now checked in for my
> module
> >> as
> >> >> >
> >> >> >
> >>
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >> >> > .
> >> >> >
> >> >> > It differs from the solution in
> >> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >> >> >
> >> >> >    - It doesn't require an API change; it but it enforces the
50ms
> >> >> >    timeout that already exists for apr_multgetp for all operations.
> >> >> >    - It works under my load test (which I found is not true of
the
> >> patch
> >> >> >    in 51065).
> >> >> >
> >> >> > For my own purposes, I will be shipping my module with
> apr_memcache2
> >> so
> >> >> > I
> >> >> > get the behavior I want regardless of what version of Apache is
> >> >> > installed.
> >> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch
is
> >> >> > attached, and I've also submitted it as an alternative patch to
bug
> >> >> > 51065.
> >> >> >
> >> >> > If you agree with the strategy I used to solve this problem, then
> >> please
> >> >> > let me know if I can help with any changes required to get this
> into
> >> the
> >> >> > main distribution,
> >> >> >
> >> >> >
> >> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
> >> >> > <jmarantz@google.com>wrote:
> >> >> >
> >> >> >> I've had some preliminary success with my own variant of
> >> apr_memcache.c
> >> >> >> (creatively called apr_memcache2.c).  Rather than setting
the
> socket
> >> >> >> timeout, I've been mimicing the timeout strategy I saw in
> >> >> >> apr_memcache_multgetp, by adding a new helper method:
> >> >> >>
> >> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t*
> temp_pool,
> >> >> >>
>  apr_memcache2_conn_t*
> >> >> >> conn) {
> >> >> >>     apr_pollset_t* pollset;
> >> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1,
temp_pool,
> 0);
> >> >> >>     if (rv == APR_SUCCESS) {
> >> >> >>         apr_pollfd_t pollfd;
> >> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >> >> >>         pollfd.reqevents = APR_POLLIN;
> >> >> >>         pollfd.p = temp_pool;
> >> >> >>         pollfd.desc.s = conn->sock;
> >> >> >>         pollfd.client_data = NULL;
> >> >> >>         apr_pollset_add(pollset, &pollfd);
> >> >> >>         apr_int32_t queries_recvd;
> >> >> >>         const apr_pollfd_t* activefds;
> >> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
> >> >> >> &queries_recvd,
> >> >> >>                               &activefds);
> >> >> >>         if (rv == APR_SUCCESS) {
> >> >> >>             assert(queries_recvd == 1);
> >> >> >>             assert(activefds->desc.s == conn->sock);
> >> >> >>             assert(activefds->client_data == NULL);
> >> >> >>         }
> >> >> >>     }
> >> >> >>     return rv;
> >> >> >> }
> >> >> >>
> >> >> >> And calling that before many of the existing calls to
> get_server_line
> >> >> >> as:
> >> >> >>
> >> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >> >> >>     if (rv != APR_SUCCESS) {
> >> >> >>         ms_release_conn(ms, conn);
> >> >> >>         return rv;
> >> >> >>     }
> >> >> >>
> >> >> >> This is just an experiment; I think I can streamline this
by
> >> >> >> pre-populating the pollfd structure as part of the
> >> apr_memcache_conn_t
> >> >> >> (actually now apr_memcache2_conn_t).
> >> >> >>
> >> >> >> I have two questions about this:
> >> >> >>
> >> >> >> 1. I noticed the version of apr_memcache.c that ships with
Apache
> 2.4
> >> >> >> is
> >> >> >> somewhat different from the one that ships with Apache 2.2.
 In
> >> >> >> particular
> >> >> >> the 2.4 version cannot be compiled against the headers that
come
> >> with a
> >> >> >> 2.2
> >> >> >> distribution.  Is there any downside to taking my hacked 2.2
> >> >> >> apr_memcache.c
> >> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >> >> >>
> >> >> >> 2. This seems wasteful in terms of system calls.  I am making
an
> >> extra
> >> >> >> call to poll, rather than relying on the socket timeout. 
The
> socket
> >> >> >> timeout didn't work as well as this though.  Does anyone have
any
> >> >> >> theories
> >> >> >> as to why, or what could be done to the patch in
> >> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to
work?
> >> >> >>
> >> >> >> -Josh
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
> >> >> >> <jmarantz@google.com>wrote:
> >> >> >>
> >> >> >>> Following up: I tried doing what I suggested above: patching
that
> >> >> >>> change
> >> >> >>> into my own copy of apr_memcache.c  It was first of all
a bad
> idea
> >> to
> >> >> >>> pull
> >> >> >>> in only part of apr_memcache.c because that file changed
slightly
> >> >> >>> between
> >> >> >>> 2.2 and 2.4 and our module works in both.
> >> >> >>>
> >> >> >>> I was successful making my own version of apr_memcache
(renaming
> >> >> >>> entry-points apr_memcache2*) that I could hack.  But if
I changed
> >> the
> >> >> >>> socket timeout from -1 to 10 seconds, then the system
behaved
> very
> >> >> >>> poorly
> >> >> >>> under load test (though it worked fine in our unit-tests
and
> >> >> >>> system-tests).
> >> >> >>>  In other words, I think the proposed patch that Jeff
pointed to
> >> above
> >> >> >>> is
> >> >> >>> not really working (as he predicted).  This test was done
without
> >> >> >>> SIGSTOPing the memcached; it would timeout under our load
anyway
> and
> >> >> >>> thereafter behave poorly.
> >> >> >>>
> >> >> >>> I'm going to follow up on that bugzilla entry, but for
now I'm
> going
> >> >> >>> to
> >> >> >>> pursue my own complicated mechanism of timing out the
calls from
> my
> >> >> >>> side.
> >> >> >>>
> >> >> >>> -Josh
> >> >> >>>
> >> >> >>>
> >> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
> >> >> >>> <jmarantz@google.com>wrote:
> >> >> >>>
> >> >> >>>> Thanks Jeff, that is very helpful.  We are considering
a course
> of
> >> >> >>>> action and before doing any work toward this, I'd
like to
> >> understand
> >> >> >>>> the
> >> >> >>>> pitfalls from people that understand Apache better
than us.
> >> >> >>>>
> >> >> >>>> Here's our reality: we believe we need to incorporate
memcached
> for
> >> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale
effectively
> for
> >> very
> >> >> >>>> large sites & hosting providers.  We are fairly
close (we
> think) to
> >> >> >>>> releasing this functionality as beta.  However, in
such large
> >> sites,
> >> >> >>>> stuff
> >> >> >>>> goes wrong: machines crash, power failure, fiber cut,
etc.
>  When it
> >> >> >>>> does we
> >> >> >>>> want to fall back to serving partially unoptimized
sites rather
> >> than
> >> >> >>>> hanging the servers.
> >> >> >>>>
> >> >> >>>> I understand the realities of backward-compatible
APIs.  My
> >> >> >>>> expectation
> >> >> >>>> is that this would take years to make it into an APR
> distribution
> >> we
> >> >> >>>> could
> >> >> >>>> depend on.  We want to deploy this functionality in
weeks.  The
> >> >> >>>> workarounds
> >> >> >>>> we have tried backgrounding the apr_memcache calls
in a thread
> and
> >> >> >>>> timing
> >> >> >>>> out in mainline are complex and even once they work
100% will be
> >> very
> >> >> >>>> unsatisfactory (resource leaks; Apache refusing to
exit cleanly
> on
> >> >> >>>> 'apachectl stop') if this happens more than (say)
once a month.
> >> >> >>>>
> >> >> >>>> Our plan is to copy the patched implementation of
> >> >> >>>> apr_memcache_server_connect and the static methods
it calls
> into a
> >> >> >>>> new .c
> >> >> >>>> file we will link into our module, naming the new
entry-point
> >> >> >>>> something
> >> >> >>>> else (apr_memcache_server_connect_with_timeout seems
good).
>  From a
> >> >> >>>> CS/SE perspective this is offensive and we admit it,
but from a
> >> >> >>>> product
> >> >> >>>> quality perspective we believe this beats freezes
and
> >> >> >>>> complicated/imperfect
> >> >> >>>> workarounds with threads.
> >> >> >>>>
> >> >> >>>> So I have two questions for the Apache community:
> >> >> >>>>
> >> >> >>>>    1. What are the practical problems with this approach?
 Note
> >> that
> >> >> >>>>    in any case a new APR rev would require editing/ifdefing
our
> >> code
> >> >> >>>> anyway,
> >> >> >>>>    so I think immunity from APR updates such as this
patch being
> >> >> >>>> applied is
> >> >> >>>>    not a distinguishing drawback.
> >> >> >>>>    2. Is there an example of the correct solution
to the
> technical
> >> >> >>>>    problem Jeff highlighted: "it is apparently missing
a call to
> >> >> >>>>    adjust the socket timeout and to discard the connection
if
> the
> >> >> >>>>    timeout is reached".  That sounds like a pattern
that might
> be
> >> >> >>>>    found elsewhere in the Apache HTTPD code base.
> >> >> >>>>
> >> >> >>>> Thanks in advance for your help!
> >> >> >>>> -Josh
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
> >> >> >>>> <trawick@gmail.com>wrote:
> >> >> >>>>
> >> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
> >> >> >>>>> <jmarantz@google.com>
> >> >> >>>>> wrote:
> >> >> >>>>> > Is there a mechanism to time out individual
operations?
> >> >> >>>>>
> >> >> >>>>> No, the socket connect timeout is hard-coded at
1 second and
> the
> >> >> >>>>> socket I/O timeout is disabled.
> >> >> >>>>>
> >> >> >>>>> Bugzilla bug
> >> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >> >> >>>>> has a patch, though it is apparently missing a
call to adjust
> the
> >> >> >>>>> socket timeout and to discard the connection if
the timeout is
> >> >> >>>>> reached.  More importantly, the API can only be
changed in
> future
> >> >> >>>>> APR
> >> >> >>>>> 2.0; alternate, backwards-compatible API(s) could
be added in
> >> future
> >> >> >>>>> APR-Util 1.6.
> >> >> >>>>>
> >> >> >>>>> >
> >> >> >>>>> > If memcached freezes, then it appears my
calls to 'get' will
> >> block
> >> >> >>>>> until
> >> >> >>>>> > memcached wakes up.  Is there any way to
set a timeout for
> that
> >> >> >>>>> > call?
> >> >> >>>>> >
> >> >> >>>>> > I can repro this in my unit tests by sending
a SIGSTOP to
> >> >> >>>>> > memcached
> >> >> >>>>> before
> >> >> >>>>> > doing a 'get'?
> >> >> >>>>> >
> >> >> >>>>> > Here are my observations:
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_multgetp seems to time out in
bounded time if I
> >> >> >>>>> > SIGSTOP
> >> >> >>>>> the
> >> >> >>>>> > memcached process. Yes!
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_getp seems to hang indefinitely
if I SIGSTOP the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_set seems to hang indefinitely
if I SIGSTOP the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_delete seems to hang indefinitely
if I SIGSTOP
> the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_stats seems to hang indefinitely
if I SIGSTOP
> the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > That last one really sucks as I am using
that to print the
> >> status
> >> >> >>>>> > of
> >> >> >>>>> all my
> >> >> >>>>> > cache shards to the log file if I detected
a problem :(
> >> >> >>>>> >
> >> >> >>>>> >
> >> >> >>>>> > Why does apr_memcache_multgetp do what I
want and not the
> >> others?
> >> >> >>>>>  Can I
> >> >> >>>>> > induce the others to have reasonable timeout
behavior?
> >> >> >>>>> >
> >> >> >>>>> > When I SIGSTOP memcached this makes it hard
to even restart
> >> >> >>>>> > Apache,
> >> >> >>>>> at
> >> >> >>>>> > least with graceful-stop.
> >> >> >>>>> >
> >> >> >>>>> >
> >> >> >>>>> > On a related note, the apr_memcache
> >> >> >>>>> > documentation<
> >> >> >>>>>
> >> >> >>>>>
> >> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >> >> >>>>> >is
> >> >> >>>>> > very thin.  I'd be happy to augment it with
my observations
> on
> >> its
> >> >> >>>>> > usage
> >> >> >>>>> > and the meaning of some of the arguments
if that was desired.
> >>  How
> >> >> >>>>> would I
> >> >> >>>>> > go about that?
> >> >> >>>>>
> >> >> >>>>> Check out APR trunk from Subversion, adjust the
doxygen docs in
> >> the
> >> >> >>>>> include files, build (make dox) and inspect the
results,
> submit a
> >> >> >>>>> patch to dev@apr.apache.org.
> >> >> >>>>>
> >> >> >>>>> >
> >> >> >>>>> > -Josh
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Born in Roswell... married an alien...
> >> >> >>>>> http://emptyhammock.com/
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Born in Roswell... married an alien...
> >> > http://emptyhammock.com/
> >>
> >>
> >>
> >> --
> >> Born in Roswell... married an alien...
> >> http://emptyhammock.com/
> >>
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Mime
View raw message