tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Recent tcnative null-dereference with 8.0.0-RC3 and 7.0.45 [tcnative-1.dll+0x7e23]
Date Thu, 03 Oct 2013 22:29:35 GMT
Mark,

On 10/3/13 3:52 PM, Mark Thomas wrote:
> On 03/10/2013 17:34, Christopher Schultz wrote:
>> On 10/3/13 11:42 AM, Mark Thomas wrote:
> 
>>> On the other hand, a JVM crash is a very strong motivation to fix
>>> an issue.
> 
>> For *you*, or for the user?
> 
> For me. I haven't looked at the historical bugs. The APR code has
> changed so much for WebSocket I'd be inclined to close all the
> historical issues as WORKSFORME (assuming they didn't come with a
> reproducible test case) and focus on the current code.

I kind of agree.... many of those reports are ooooold.

>> Certainly it is for the user, but given the number of unfixed crash
>> reports in Bugzilla, it doesn't seem like tcnative-crash=quick-fix
>> from the team. I've been trying to investigate them whenever
>> possible but it's hard for me to get more information and, frankly,
>> I don't know anything about APR socket management itself so my time
>> is only of limited use.
> 
> The AprEndpoint can be hard to get your head around starting from
> scratch. It has taken me a long time to feel comfortable making
> changes to that code. More comments in the AprEndpoint code should
> help. I've been trying to add them as I make changes.
> 
>>> - documenting some of the constraints around using SSL would
>>> have saved me some time when getting SSL and WebSocket working
> 
>> Can you be more specific without just writing the documentation 
>> yourself? I'm hoping to help, but I'm not sure what SSL constraints
>> you are alluding to.
> 
> If you get a partial read/write you have to repeat the call with
> exactly the same parameters (i.e. the same objects) as you made the
> first call.

Interesting. So partial-write means that you should just idempotently
re-try?

>>> -730053
> 
>> This one isn't valid, as far as I can tell (the error string is 
>> "Unrecognized resolver error"). 70053 is "Error string not
>> specified yet" so I'm not sure what that one is.
> 
> 720000 is the offset for OS native error codes.
> 10053 is client abort on Windows.

Aah, Windows. No wonder I couldn't find the error code on Linux ;)

>>> I can look them up to figure what they mean but it would save
>>> some time if the error report included a text version.
> 
>> tcnative doesn't have an error log, so where could those strings
>> go? Or were you thinking of having a bridge from Java code into
>> apr_strerr?
> 
> I was thinking maybe an APR function that gave a textual error message
> for a given error code. Currently, the APR Java code reports just the
> error number. It would be nice to include a meaningful text message.

Okay, that's something I can do: a JNI wrapper around apr_strerror
should be trivial.

>> What about a program like perror that understands APR error codes?
>> I've written a simple one that could be helpful, but you probably
>> did that yourself already.
> 
> Nope. I use Google :)

LOL. Really.

I've got a quick and dirty C program that compiles cleanly on Mac,
Linux. I suppose it would compile on Windows, but I don't have a win32
compiler.

>> Anywhere more information can be added, I'm happy to help.
> 
>>>>> - Refactoring the connectors so all socket access goes
>>>>> through the SocketWrapper so there is a much smaller set of
>>>>> code to validate.
>>>>
>>>> I'm guessing you are tackling this task slowly over time.
>>>
>>> I am moving slowly in this direction. My ultimate aim is to have
>>> the connector type specific code only in the Endpoint and the 
>>> SocketWrapper. No idea if that is possible. It is a longer term
>>> goal for Tomcat 9+ at this point.
>>>
>>> At the very least whenever I add functionality to the connectors
>>> (e.g. non-blocking IO) I do enough refactoring that I only have
>>> to add the new stuff once.
> 
>> Sounds good. Having unified code with only certain aspects
>> separated into BIO, NIO, and APR will certainly help folks like me
>> understand the "true" conceptual relationship between all of these
>> components and make it easier to actually help work on that part of
>> the code.
> 
> Exactly. Simpler maintenance is one of my goals when reducing the
> duplication in the connector code. It is also something that can often
> be done in small and/or simple steps. There is always scope for
> someone to start contributing in that area (with the caveat that they
> need to be careful not to bite off more than they can chew - it is
> also easy to get into a right mess).

:)

-chris


Mime
View raw message