Return-Path: X-Original-To: apmail-tomcat-dev-archive@www.apache.org Delivered-To: apmail-tomcat-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C5FA1075F for ; Thu, 3 Oct 2013 13:18:53 +0000 (UTC) Received: (qmail 94753 invoked by uid 500); 3 Oct 2013 13:18:51 -0000 Delivered-To: apmail-tomcat-dev-archive@tomcat.apache.org Received: (qmail 94686 invoked by uid 500); 3 Oct 2013 13:18:51 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 94672 invoked by uid 99); 3 Oct 2013 13:18:50 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Oct 2013 13:18:50 +0000 Received: from localhost (HELO [192.168.23.9]) (127.0.0.1) (smtp-auth username markt, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Oct 2013 13:18:50 +0000 Message-ID: <524D6EB3.4020209@apache.org> Date: Thu, 03 Oct 2013 14:18:43 +0100 From: Mark Thomas User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Tomcat Developers List Subject: Re: Recent tcnative null-dereference with 8.0.0-RC3 and 7.0.45 [tcnative-1.dll+0x7e23] References: <5246199A.70000@christopherschultz.net> <5246AEB6.5050509@kippdata.de> <5246B613.6010504@apache.org> <52492924.1040205@apache.org> <52497DE2.4010007@apache.org> <524999BB.1010007@christopherschultz.net> <5249B7FE.2060200@apache.org> <5249C801.3030908@christopherschultz.net> <524A5550.8040701@apache.org> <524A72BA.60709@kippdata.de> <524AD8E6.6010607@christopherschultz.net> <524ADE95.4030304@apache.org> <524AE07E.5020604@christopherschultz.net> <524BACA4.8080008@apache.org> <524D683B.8080609@christopherschultz.net> In-Reply-To: <524D683B.8080609@christopherschultz.net> X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 03/10/2013 13:51, Christopher Schultz wrote: > Sebb, > > On 10/3/13 8:06 AM, sebb wrote: >> The problem is that bugs that reveal themselves as JVM crashes >> are much harder to debug. > > +1 > > This is exactly the point I was arguing. When we get a JVM crash > report, the stack dump could be completely different depending upon > which architecture, kernel, compiler, optimization flags, etc. that > were used when compiling and/or running the library. Converting > SIGSEGV into an exception gives us a lot of freedom to publish tons > of useful information when reporting errors to the user. > > I'd rather get a report from a user that says something like "here > is my stack trace and error message, complete with name of variable > that was NULL and line of source code" rather than "here's my JVM > crash report, sorry I didn't get a core file, I'm having trouble > reproducing the error" (which is essentially what all > currently-open JVM-crash error reports look like in BZ for > tcnative). Having been responsible for introducing and fixing a number of these issues I disagree. It is usually fairly easy to tell what the problem is from the crash report (double close on a socket, invalid socket in the poller, etc). What is far more difficult is figuring out how things got into the known invalid state in the first place. No amount of debug information at the point of the crash is going to help with that. A hard to reproduce bug in APR that triggers a crash is no more or less difficult to work with than a hard to reproduce bug in NIO that triggers an unexpected socket close. You may have noticed that I have slowly been adding debug code to the low level connector code, primarily in the Endpoints and the Protocol implementations. All of this debug code has proven useful in tracking down the type of bug that triggers a crash with APR. Additional validity checks in the native code provide for a more graceful failure mode but offer little other benefit as the useful information is more focussed on how the current state was achieved rather than what the current state is. In all of the recent APR issues I have been working on, by far the most useful tool has been the reproducible test case. Unfortunately, I know of no way to make those easier to generate. >>> We can add compile time '#if defined(MAINTAINER_MODE) ... >>> #endif' checks for easier debugging at development, >> >> Any crashes that occur in a released version of TC are likely to >> be fairly rare, otherwise they would be detected in testing. So >> the MAINTAINER_MODE is not likely to be much use after the >> initial shakedown period. Unless the debugging checks are >> expensive, why not leave them in? > > Obviously given my previous comments, I agree with this. If the > MAINTAINER had an exhaustive set of unit tests, there would be no > error reports, right? ;) I argue that MAINTAINER_MODE is exactly > the opposite of what you want: you want real users to have this > information precisely because they are *not* programmers (at least > not C programmers). I'm -0 on adding additional checks to the native code. I can think of several things that would be more useful: - Better Javadoc for the native methods. I can think of a number of times where better docs would have saved me a fair amount of time debugging unexpected behaviour. - Something to turn an APR error response code into meaningful test. - Refactoring the connectors so all socket access goes through the SocketWrapper so there is a much smaller set of code to validate. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org For additional commands, e-mail: dev-help@tomcat.apache.org