tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Recent tcnative null-dereference with 8.0.0-RC3 and 7.0.45 [tcnative-1.dll+0x7e23]
Date Mon, 30 Sep 2013 15:33:15 GMT
Mark,

On 9/30/13 9:34 AM, Mark Thomas wrote:
> On 30/09/2013 08:32, Mark Thomas wrote:
>> On 28/09/2013 11:57, Mladen Turk wrote:
>>> On 09/28/2013 12:25 PM, Rainer Jung wrote:
>>>>
>>>>> I can't seem to reproduce the problem using an APR connector on Linux:
>>>>> reports seems to indicate that trivially accessing Tomcat via an HTTP
>>>>> APR connector will cause a crash, and I was able to run the following
>>>>> command without bringing down my instance:
>>>>>
>>>>> $ ab -n 10000 -c 50 http://localhost:28215/my-webapp/index.html
>>>>
>>>>
>>>> Still the whole poll stuff is platform dependent, so it might be you
>>>> will not be able to reproduce the Windows crash even when the test
>>>> scenario is exactly the same as there.
>>>>
>>>
>>> Almost every APR crash error is related to reusing closed
>>> object/descriptor/pointer.
>>> Eg, crashing inside poll can be caused by closing the socket that is
>>> inside poller.
>>> The fact that it doesn't crash on linux might be just because of "close"
>>> order.
>>> Even on the same OS it doesn't have to crash all the time.
>>>
>>> Anyhow, I'll try to check the 8-RC3 with the latest native on windows.
>>> Seems there are some conceptual problem used inside TC8 probably with
>>> async sockets and double close.
>>
>> I suspect multiple factors:
>> - the refactoring to a single poller thread may have introduced some
>> additional issues we haven't tracked down yet
>> - allowing two threads (one read, one write) to work with the same
>> socket introduces lots of opportunities to close a socket in one thread
>> while it is still in use in another
>>
>> I've been mulling over refactoring the connector code so all actions on
>> the socket go via the socket wrapper so it is easier to track true
>> socket state in one place. Now might be the time to implement that change.
> 
> I believe I have found the root cause of the crash. It is the code I
> added to remove a socket from the poller when it was being closed.
> 
> Since removing a socket from the poller is not thread safe, I modified
> the code so that the poller thread did this. The problem this introduced
> is that the socket may well be closed before it is removed from the
> poller and there is a strong possibility that poll() will be called on
> the closed socket. Either of these happening will trigger a crash.

I may need a little more guidance, since adding NULL-checks for
everything is a little silly. Here is the function declaration in tcnative:

TCN_IMPLEMENT_CALL(jint, Poll, poll)(TCN_STDARGS, jlong pollset,
                                     jlong timeout, jlongArray set,
                                     jboolean remove)


Presumably, the pollset itself is non-null, but could all of the items
in the pollset be null?

The pollset itself is directly dereferenced, of course, but then it
looks like elements "*ep" are dereferenced in the setup loop of the
function. Then, there is a bunch of ring-shifting which does not appear
to be threadsafe in and of itself, but may be handled by another mechanism.

I'll work with Konstantin Prei├čer and and Ognjen to get tcnative patched
up a bit to at least prevent crashes.

Thanks,
-chris


Mime
View raw message