Return-Path: X-Original-To: apmail-mesos-issues-archive@minotaur.apache.org Delivered-To: apmail-mesos-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC0EC17C42 for ; Sat, 7 Mar 2015 06:52:38 +0000 (UTC) Received: (qmail 71913 invoked by uid 500); 7 Mar 2015 06:52:38 -0000 Delivered-To: apmail-mesos-issues-archive@mesos.apache.org Received: (qmail 71879 invoked by uid 500); 7 Mar 2015 06:52:38 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 71869 invoked by uid 99); 7 Mar 2015 06:52:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 06:52:38 +0000 Date: Sat, 7 Mar 2015 06:52:38 +0000 (UTC) From: "Michael Park (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (MESOS-2414) Java bindings segfault during framework shutdown MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MESOS-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351444#comment-14351444 ] Michael Park edited comment on MESOS-2414 at 3/7/15 6:51 AM: ------------------------------------------------------------- I had a chance to look at this issue and Niklas's patch and I'd just like to document my thoughts here in case issues similar to this comes up again. First off, static local variables are thread-safe as of C++11. They also have a "construct-on-first-use" semantics, but it's critical to note that this means the *initializer* of a static local variable gets called once. Here's the original piece of code: {code} Result getFieldID(/* ... */) { static jclass NO_SUCH_FIELD_ERROR = NULL; if (NO_SUCH_FIELD_ERROR == NULL) { NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError"); if (env->ExceptionCheck() == JNI_TRUE) { return Error("Cannot find NoSuchFieldError class"); } } /* ... */ } {code} In this case, the initializer of {{NO_SUCH_FIELD_ERROR}} is simply {{NULL}} which gets called once safely. We then proceed to run into races in the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause. {{FindClass}} returns bq. a class object from a fully-qualified name, or {{NULL}} if the class cannot be found. If {{FindClass}} *cannot* find the class, then we're okay. It returns {{NULL}}, {{NO_SUCH_FIELD_ERROR}} gets set to {{NULL}}, an exception is raised internally so {{if (env->ExceptionCheck() == JNI_TRUE)}} is true and we exit returning an {{Error}}. On subsequent invocations we'll simply keep trying this and we don't run into breaking issues. If {{FindClass}} *can* find the class, we get into trouble. How? Here's a *possible race* sequence: # Thread A comes along first and initializes {{NO_SUCH_FIELD_ERROR}} to {{NULL}}. # Thread A proceeds and executes {{NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");}} which sets {{NO_SUCH_FIELD_ERROR}} to some valid pointer. # Thread B causes an exception to be thrown. # Thread A checks for (b) {{if (env->ExceptionCheck() == JNI_TRUE)}} and proceeds to return an {{Error}}. *Note that {{NO_SUCH_FIELD_ERROR}} is still set to a valid pointer!* # The instance that {{NO_SUCH_FIELD_ERROR}} is pointing at is garbage-collected (or something similar) which invalidates {{NO_SUCH_FIELD_ERROR}}. # On a subsequent call, {{NO_SUCH_FIELD_ERROR}} is set so we proceed to use it even though it's been invalidated. The first place we actually use {{{NO_SUCH_FIELD_ERROR}} is {{IsInstanceOf}} which is precisely where we were seg-faulting. *Semi-related Q*: Is {{env->ExceptionCheck() == JNI_TRUE}} the correct predicate to check for the success of {{FindClass}}? Seems to me like we should go directly to checking that the result of {{FindClass}} is non-null. I say the above is semi-related because I believe *the crux of issue is that the initializing logic for {{NO_SUCH_FIELD_ERROR}} is not thread-safe*. The initializing logic (i.e. the body of the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause) should just live in the initializer and the compiler will handle the thread-safety for us. {code} jclass findNoSuchFieldErrorClass() { jclass result = env->FindClass("java/lang/NoSuchFieldError"); if (env->ExceptionCheck() == JNI_TRUE) { // or whatever the correct condition should be return NULL; } return result; } Result getFieldID(/* ... */) { static jclass NO_SUCH_FIELD_ERROR = findNoSuchFieldErrorClass(); if (NO_SUCH_FIELD_ERROR == NULL) { return Error("Cannot find NoSuchFieldError class"); } /* ... */ } {code} was (Author: mcypark): I had a chance to look at this issue and Niklas's patch and I'd just like to document my thoughts here in case issues similar to this comes up again. First off, static local variables are thread-safe as of C++11. They also have a "construct-on-first-use" semantics, but it's critical to note that this means the *initializer* of a static local variable gets called once. Here's the original piece of code: {code} Result getFieldID(/* ... */) { static jclass NO_SUCH_FIELD_ERROR = NULL; if (NO_SUCH_FIELD_ERROR == NULL) { NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError"); if (env->ExceptionCheck() == JNI_TRUE) { return Error("Cannot find NoSuchFieldError class"); } } /* ... */ } {code} In this case, the initializer of {{NO_SUCH_FIELD_ERROR}} is simply {{NULL}} which gets called once safely. We then proceed to run into races in the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause. {{FindClass}} returns bq. a class object from a fully-qualified name, or {{NULL}} if the class cannot be found. If {{FindClass}} *cannot* find the class, then we're okay. It returns {{NULL}}, {{NO_SUCH_FIELD_ERROR}} gets set to {{NULL}}, an exception is raised internally so {{if (env->ExceptionCheck() == JNI_TRUE)}} is true and we exit returning an {{Error}}. On subsequent invocations we'll simply keep trying this and we don't run into breaking issues. If {{FindClass}} *can* find the class, we get into trouble. How? Here's a possible race: # Thread A comes along first and initializes {{NO_SUCH_FIELD_ERROR}} to {{NULL}}. # Thread A proceeds and executes {{NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");}} which sets {{NO_SUCH_FIELD_ERROR}} to some valid pointer. # Thread B causes an exception to be thrown. # Thread A checks for (b) {{if (env->ExceptionCheck() == JNI_TRUE)}} and proceeds to return an {{Error}}. *Note that {{NO_SUCH_FIELD_ERROR}} is still set to a valid pointer!* # The instance that {{NO_SUCH_FIELD_ERROR}} is pointing at is garbage-collected (or something similar) which invalidates {{NO_SUCH_FIELD_ERROR}}. # On a subsequent call, {{NO_SUCH_FIELD_ERROR}} is set so we proceed to use it even though it's been invalidated. The first place we actually use {{{NO_SUCH_FIELD_ERROR}} is {{IsInstanceOf}} which is precisely where we were seg-faulting. *Semi-related Q*: Is {{env->ExceptionCheck() == JNI_TRUE}} the correct predicate to check for the success of {{FindClass}}? Seems to me like we should go directly to checking that the result of {{FindClass}} is non-null. I say the above is semi-related because I believe *the crux of issue is that the initializing logic for {{NO_SUCH_FIELD_ERROR}} is not thread-safe*. The initializing logic (i.e. the body of the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause) should just live in the initializer and the compiler will handle the thread-safety for us. {code} jclass findNoSuchFieldErrorClass() { jclass result = env->FindClass("java/lang/NoSuchFieldError"); if (env->ExceptionCheck() == JNI_TRUE) { // or whatever the correct condition should be return NULL; } return result; } Result getFieldID(/* ... */) { static jclass NO_SUCH_FIELD_ERROR = findNoSuchFieldErrorClass(); if (NO_SUCH_FIELD_ERROR == NULL) { return Error("Cannot find NoSuchFieldError class"); } /* ... */ } {code} > Java bindings segfault during framework shutdown > ------------------------------------------------ > > Key: MESOS-2414 > URL: https://issues.apache.org/jira/browse/MESOS-2414 > Project: Mesos > Issue Type: Bug > Reporter: Niklas Quarfot Nielsen > Assignee: Niklas Quarfot Nielsen > > {code} > I0226 16:39:59.063369 626044928 sched.cpp:831] Stopping framework '20150220-141149-16777343-5050-45194-0000' > [2015-02-26 16:39:59,063] INFO Driver future completed. Executing optional abdication command. (mesosphere.marathon.MarathonSchedulerService:191) > [2015-02-26 16:39:59,065] INFO Setting framework ID to 20150220-141149-16777343-5050-45194-0000 (mesosphere.marathon.MarathonSchedulerService:75) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000106a266d0, pid=99408, tid=44291 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode bsd-amd64 compressed oops) > # Problematic frame: > # V [libjvm.dylib+0x4266d0] Klass::is_subtype_of(Klass*) const+0x4 > # > # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/corpsi/projects/marathon/hs_err_pid99408.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # > Abort trap: 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)