apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yann Ylavic <ylavic....@gmail.com>
Subject Re: Issues/questions with apr_memcache_multigetp
Date Sat, 31 Oct 2015 17:56:48 GMT
Thanks Jeffrey for testing.

Committed to trunk in http://svn.apache.org/r1711657, will let it
there for a few times (review) and then backport it to 1.6.x and 1.5.x
branches.

Regards,
Yann.


On Wed, Oct 28, 2015 at 3:41 PM, Jeffrey Crowell <jcrowell@google.com> wrote:
> That patch works, and users have confirmed that the error message added has
> appeared in logs.
>
> (applied to our copy here)
>
> https://github.com/pagespeed/mod_pagespeed/commit/84a9deaf9c4df13ae707f44d06f577321de46e8c
>
> Thanks,
> Jeff
>
> On Thu, Sep 24, 2015 at 1:28 PM, Jeffrey Crowell <jcrowell@google.com>
> wrote:
>>
>> Hi Yann,
>>
>> This patch looks like it should fix the hang we've seen based on the
>> instruction trace provided from a bug report (
>> http://i.imgur.com/RI3TKrU.png ) where essentially, queries_sent (
>> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1414
>> ) is never decremented.
>>
>> The only reports of the bug have had HAProxy along with memcached, so I'm
>> not sure if something funky is going on there closing a connection in an odd
>> way.
>>
>> One quick question on the parse_size function though.
>>
>> Based on the protocol.txt for memcached, it seems like it should also be
>> valid if endptr[0] == ' ', endptr[1] == '\r', entptr[2] == '\n'
>> or even if just endptr[0] == ' ', no?
>>
>> https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L227
>>
>> I'm not really quite sure if the protol description is correct.
>>
>> Should it in fact read
>>
>> VALUE <key> <flags> <bytes>[ <cas unique>]\r\n
>>
>> instead of
>>
>> VALUE <key> <flags> <bytes> [<cas unique>]\r\n
>>
>> And if the optional cas_unique token is present, wont this fail?
>>
>> Jeff
>>
>>
>>
>> On Thu, Sep 24, 2015 at 10:33 AM, Yann Ylavic <ylavic.dev@gmail.com>
>> wrote:
>>>
>>> Hi Jeffrey,
>>>
>>> On Wed, Sep 23, 2015 at 9:23 PM, Jeffrey Crowell <jcrowell@google.com>
>>> wrote:
>>> >
>>> > We have some patches which were created in an attempt to fix some
>>> > signed
>>> > bugs in the original code that I think may be causing our issue.
>>> >
>>> > Namely here:
>>> >
>>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1453
>>> > vs here
>>> >
>>> > https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L192
>>> > The original code uses an atoi(), which has undefined behavior when
>>> > called
>>> > on non integer strings, leaving len to be 0.
>>> >
>>> > Second, the check here
>>> >
>>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1457
>>> > vs
>>> > https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L199
>>> >
>>> > In the original apr code, this check will never fail. len is an
>>> > apr_size_t,
>>> > and will never be < 0.
>>>
>>> The code you refer has changed with [1] (i.e. APR-1.4.3, the current
>>> version is 1.5.2), but still does not seem to fix the invalid/unknown
>>> length/value/type issue, which are both unexpected errors in
>>> apr_memcache, and as such should probably terminate the connection...
>>>
>>> >
>>> > The issue here now is that the check in the "server sent back a key
>>> > that i
>>> > didn't ask for"
>>> >
>>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1507
>>> > (same in both forked and upstream), apr_pollset_remove, is never
>>> > called, and
>>> > the number of queries sent is never changed.
>>> >
>>> > Does it seem possible that this is causing the server to spin here
>>>
>>> Yes, in the unexpected cases mentioned above, the socket is not
>>> "consumed" (i.e. apr_brigade_partition() is not called), which let
>>> some immediatly poll()able data for the next loops.
>>>
>>> > would apr be interested in accepting a patch to fix the signedness
>>> > issues?
>>>
>>> Of course! I included parse_size() in the attached patch which tries
>>> to address the parsing/unexpected issues by aborting the connection
>>> and returning an error.
>>> Does it work for you (it is based on trunk, so you may need to adapt
>>> it to your version, which preferably should be the latest...)?
>>>
>>> I'm not sure about the handling of empty values (the real len == 0
>>> case, not the parsing error).
>>> The previous code did not expect the trailing CRLF, while this patch
>>> does...
>>>
>>> Regards,
>>> Yann.
>>>
>>> [1] http://svn.apache.org/r982408
>>
>>
>

Mime
View raw message