Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Tomcat Developers List" <dev@tomcat.apache.org>
Message-ID: <524D6EB3.4020209@apache.org>
Date: Thu, 03 Oct 2013 14:18:43 +0100
From: Mark Thomas <markt@apache.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Tomcat Developers List <dev@tomcat.apache.org>
Subject: Re: Recent tcnative null-dereference with 8.0.0-RC3 and 7.0.45
 [tcnative-1.dll+0x7e23]
References: <5246199A.70000@christopherschultz.net>
 <5246AEB6.5050509@kippdata.de> <5246B613.6010504@apache.org>
 <52492924.1040205@apache.org> <52497DE2.4010007@apache.org>
 <524999BB.1010007@christopherschultz.net> <5249B7FE.2060200@apache.org>
 <5249C801.3030908@christopherschultz.net> <524A5550.8040701@apache.org>
 <524A72BA.60709@kippdata.de> <524AD8E6.6010607@christopherschultz.net>
 <524ADE95.4030304@apache.org> <524AE07E.5020604@christopherschultz.net>
 <CAOGo0VZRqLDW5j9-DQmLfCma7coGR_wALp5TezzLzJtXCauV3A@mail.gmail.com>
 <524BACA4.8080008@apache.org>
 <CAOGo0VY+j2+9uf+HqiEBvqbC9nZktgN1v5-UoWDrPvH3WjwZ5A@mail.gmail.com>
 <524D683B.8080609@christopherschultz.net>
In-Reply-To: <524D683B.8080609@christopherschultz.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

On 03/10/2013 13:51, Christopher Schultz wrote:
> Sebb,
> 
> On 10/3/13 8:06 AM, sebb wrote:
>> The problem is that bugs that reveal themselves as JVM crashes
>> are much harder to debug.
> 
> +1
> 
> This is exactly the point I was arguing. When we get a JVM crash
> report, the stack dump could be completely different depending upon
> which architecture, kernel, compiler, optimization flags, etc. that
> were used when compiling and/or running the library. Converting
> SIGSEGV into an exception gives us a lot of freedom to publish tons
> of useful information when reporting errors to the user.
> 
> I'd rather get a report from a user that says something like "here
> is my stack trace and error message, complete with name of variable
> that was NULL and line of source code" rather than "here's my JVM
> crash report, sorry I didn't get a core file, I'm having trouble
> reproducing the error" (which is essentially what all
> currently-open JVM-crash error reports look like in BZ for
> tcnative).

Having been responsible for introducing and fixing a number of these
issues I disagree. It is usually fairly easy to tell what the problem
is from the crash report (double close on a socket, invalid socket in
the poller, etc). What is far more difficult is figuring out how
things got into the known invalid state in the first place. No amount
of debug information at the point of the crash is going to help with that.

A hard to reproduce bug in APR that triggers a crash is no more or
less difficult to work with than a hard to reproduce bug in NIO that
triggers an unexpected socket close.

You may have noticed that I have slowly been adding debug code to the
low level connector code, primarily in the Endpoints and the Protocol
implementations. All of this debug code has proven useful in tracking
down the type of bug that triggers a crash with APR.

Additional validity checks in the native code provide for a more
graceful failure mode but offer little other benefit as the useful
information is more focussed on how the current state was achieved
rather than what the current state is.

In all of the recent APR issues I have been working on, by far the
most useful tool has been the reproducible test case. Unfortunately, I
know of no way to make those easier to generate.

>>> We can add compile time '#if defined(MAINTAINER_MODE) ...
>>> #endif' checks for easier debugging at development,
>> 
>> Any crashes that occur in a released version of TC are likely to
>> be fairly rare, otherwise they would be detected in testing. So
>> the MAINTAINER_MODE is not likely to be much use after the
>> initial shakedown period. Unless the debugging checks are
>> expensive, why not leave them in?
> 
> Obviously given my previous comments, I agree with this. If the 
> MAINTAINER had an exhaustive set of unit tests, there would be no
> error reports, right? ;) I argue that MAINTAINER_MODE is exactly
> the opposite of what you want: you want real users to have this
> information precisely because they are *not* programmers (at least
> not C programmers).

I'm -0 on adding additional checks to the native code.

I can think of several things that would be more useful:
- Better Javadoc for the native methods. I can think of a number of
times where better docs would have saved me a fair amount of time
debugging unexpected behaviour.
- Something to turn an APR error response code into meaningful test.
- Refactoring the connectors so all socket access goes through the
SocketWrapper so there is a much smaller set of code to validate.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org