mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Park (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-2414) Java bindings segfault during framework shutdown
Date Sat, 07 Mar 2015 06:52:38 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351444#comment-14351444
] 

Michael Park edited comment on MESOS-2414 at 3/7/15 6:51 AM:
-------------------------------------------------------------

I had a chance to look at this issue and Niklas's patch and I'd just like to document my thoughts
here in case issues similar to this comes up again.

First off, static local variables are thread-safe as of C++11. They also have a "construct-on-first-use"
semantics, but it's critical to note that this means the *initializer* of a static local variable
gets called once.

Here's the original piece of code:

{code}
Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = NULL;

  if (NO_SUCH_FIELD_ERROR == NULL) {
    NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");
    if (env->ExceptionCheck() == JNI_TRUE) {
      return Error("Cannot find NoSuchFieldError class");
    }
  }

  /* ... */
}
{code}

In this case, the initializer of {{NO_SUCH_FIELD_ERROR}} is simply {{NULL}} which gets called
once safely. We then proceed to run into races in the {{if (NO_SUCH_FIELD_ERROR == NULL)}}
clause.

{{FindClass}} returns
bq. a class object from a fully-qualified name, or {{NULL}} if the class cannot be found.

If {{FindClass}} *cannot* find the class, then we're okay. It returns {{NULL}}, {{NO_SUCH_FIELD_ERROR}}
gets set to {{NULL}}, an exception is raised internally so {{if (env->ExceptionCheck()
== JNI_TRUE)}} is true and we exit returning an {{Error}}. On subsequent invocations we'll
simply keep trying this and we don't run into breaking issues.

If {{FindClass}} *can* find the class, we get into trouble. How? Here's a *possible race*
sequence:

# Thread A comes along first and initializes {{NO_SUCH_FIELD_ERROR}} to {{NULL}}.
# Thread A proceeds and executes {{NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");}}
which sets {{NO_SUCH_FIELD_ERROR}} to some valid pointer.
# Thread B causes an exception to be thrown.
# Thread A checks for (b) {{if (env->ExceptionCheck() == JNI_TRUE)}} and proceeds to return
an {{Error}}. *Note that {{NO_SUCH_FIELD_ERROR}} is still set to a valid pointer!*
# The instance that {{NO_SUCH_FIELD_ERROR}} is pointing at is garbage-collected (or something
similar) which invalidates {{NO_SUCH_FIELD_ERROR}}.
# On a subsequent call, {{NO_SUCH_FIELD_ERROR}} is set so we proceed to use it even though
it's been invalidated. The first place we actually use {{{NO_SUCH_FIELD_ERROR}} is {{IsInstanceOf}}
which is precisely where we were seg-faulting.

*Semi-related Q*: Is {{env->ExceptionCheck() == JNI_TRUE}} the correct predicate to check
for the success of {{FindClass}}? Seems to me like we should go directly to checking that
the result of {{FindClass}} is non-null.

I say the above is semi-related because I believe *the crux of issue is that the initializing
logic for {{NO_SUCH_FIELD_ERROR}} is not thread-safe*. The initializing logic (i.e. the body
of the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause) should just live in the initializer and
the compiler will handle the thread-safety for us.

{code}
jclass findNoSuchFieldErrorClass() {
  jclass result = env->FindClass("java/lang/NoSuchFieldError");
  if (env->ExceptionCheck() == JNI_TRUE) {  // or whatever the correct condition should
be
    return NULL;
  }
  return result;
}

Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = findNoSuchFieldErrorClass();

  if (NO_SUCH_FIELD_ERROR == NULL) {
    return Error("Cannot find NoSuchFieldError class");
  }

  /* ... */
}
{code}


was (Author: mcypark):
I had a chance to look at this issue and Niklas's patch and I'd just like to document my thoughts
here in case issues similar to this comes up again.

First off, static local variables are thread-safe as of C++11. They also have a "construct-on-first-use"
semantics, but it's critical to note that this means the *initializer* of a static local variable
gets called once.

Here's the original piece of code:

{code}
Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = NULL;

  if (NO_SUCH_FIELD_ERROR == NULL) {
    NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");
    if (env->ExceptionCheck() == JNI_TRUE) {
      return Error("Cannot find NoSuchFieldError class");
    }
  }

  /* ... */
}
{code}

In this case, the initializer of {{NO_SUCH_FIELD_ERROR}} is simply {{NULL}} which gets called
once safely. We then proceed to run into races in the {{if (NO_SUCH_FIELD_ERROR == NULL)}}
clause.

{{FindClass}} returns
bq. a class object from a fully-qualified name, or {{NULL}} if the class cannot be found.

If {{FindClass}} *cannot* find the class, then we're okay. It returns {{NULL}}, {{NO_SUCH_FIELD_ERROR}}
gets set to {{NULL}}, an exception is raised internally so {{if (env->ExceptionCheck()
== JNI_TRUE)}} is true and we exit returning an {{Error}}. On subsequent invocations we'll
simply keep trying this and we don't run into breaking issues.

If {{FindClass}} *can* find the class, we get into trouble. How? Here's a possible race:

# Thread A comes along first and initializes {{NO_SUCH_FIELD_ERROR}} to {{NULL}}.
# Thread A proceeds and executes {{NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");}}
which sets {{NO_SUCH_FIELD_ERROR}} to some valid pointer.
# Thread B causes an exception to be thrown.
# Thread A checks for (b) {{if (env->ExceptionCheck() == JNI_TRUE)}} and proceeds to return
an {{Error}}. *Note that {{NO_SUCH_FIELD_ERROR}} is still set to a valid pointer!*
# The instance that {{NO_SUCH_FIELD_ERROR}} is pointing at is garbage-collected (or something
similar) which invalidates {{NO_SUCH_FIELD_ERROR}}.
# On a subsequent call, {{NO_SUCH_FIELD_ERROR}} is set so we proceed to use it even though
it's been invalidated. The first place we actually use {{{NO_SUCH_FIELD_ERROR}} is {{IsInstanceOf}}
which is precisely where we were seg-faulting.

*Semi-related Q*: Is {{env->ExceptionCheck() == JNI_TRUE}} the correct predicate to check
for the success of {{FindClass}}? Seems to me like we should go directly to checking that
the result of {{FindClass}} is non-null.

I say the above is semi-related because I believe *the crux of issue is that the initializing
logic for {{NO_SUCH_FIELD_ERROR}} is not thread-safe*. The initializing logic (i.e. the body
of the {{if (NO_SUCH_FIELD_ERROR == NULL)}} clause) should just live in the initializer and
the compiler will handle the thread-safety for us.

{code}
jclass findNoSuchFieldErrorClass() {
  jclass result = env->FindClass("java/lang/NoSuchFieldError");
  if (env->ExceptionCheck() == JNI_TRUE) {  // or whatever the correct condition should
be
    return NULL;
  }
  return result;
}

Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = findNoSuchFieldErrorClass();

  if (NO_SUCH_FIELD_ERROR == NULL) {
    return Error("Cannot find NoSuchFieldError class");
  }

  /* ... */
}
{code}

> Java bindings segfault during framework shutdown
> ------------------------------------------------
>
>                 Key: MESOS-2414
>                 URL: https://issues.apache.org/jira/browse/MESOS-2414
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Niklas Quarfot Nielsen
>
> {code}
> I0226 16:39:59.063369 626044928 sched.cpp:831] Stopping framework '20150220-141149-16777343-5050-45194-0000'
> [2015-02-26 16:39:59,063] INFO Driver future completed. Executing optional abdication
command. (mesosphere.marathon.MarathonSchedulerService:191)
> [2015-02-26 16:39:59,065] INFO Setting framework ID to 20150220-141149-16777343-5050-45194-0000
(mesosphere.marathon.MarathonSchedulerService:75)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0000000106a266d0, pid=99408, tid=44291
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode bsd-amd64 compressed
oops)
> # Problematic frame:
> # V  [libjvm.dylib+0x4266d0]  Klass::is_subtype_of(Klass*) const+0x4
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try
"ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /Users/corpsi/projects/marathon/hs_err_pid99408.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
> Abort trap: 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message