hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Thread safety issues with JNI/native code from map tasks?
Date Sat, 29 Jan 2011 00:24:54 GMT
JNI may also work fine with no GC running, but then work badly when GC kicks
in at a bad time. For example, if you grab a pointer to a String or array,
you need to essentially lock them so the GC doesn't relocate the objects
underneath you. For example, maybe you're releasing one of these references
and then continuing to use it?


On Fri, Jan 28, 2011 at 3:50 PM, Greg Roelofs <roelofs@yahoo-inc.com> wrote:

> Keith Wiley wrote:
> > (1) Speculative execution would occur on a completely different
> > node, so there definitely isn't any thread cross-talk (in memory).
> > So long as they don't rely on reading/writing temp files from
> > HDFS I don't see how they could have any effect on one another.
> Good point.
> > (2) I am also getting seg faults when I run in noncluster
> > standalone mode, which is a single nonspeculated thread......I
> > presume.
> That's the same as "pseudo-distributed mode"?
> > Can you explain your thoughts on speculative execution w.r.t. the
> > problems I'm having?
> Thoughts?  You expect me to have thoughts, too??
> :-)
> I had not fully thought through the spec ex idea; it was the only thing
> I could think of that might put two (otherwise independent) JNI-using tasks
> onto the same node.  But as you point out above, it wouldn't...
> Does your .so depend on any other potentially thread-unsafe .so that other
> (non-Hadoop) processes might be using?  System libraries like zlib are safe
> (else they wouldn't make very good system libraries), but maybe some other
> research library or something?  (That's a long shot, but I'm pretty much
> grasping at straws here.)
> > Yes, not thread safe, but what difference could that make if I
> > don't use the library in a multi-threaded fashion.  One map task,
> > one node, one Java thread calling JNI and using the native code?
> > How do thread safety issues factor into this?  I admit, it's
> > my theory that threads might be involved somehow, but I don't
> > understand how, I'm just shooting in the dark since I can't
> > solve this problem any other way yet.
> Since you can reproduce it in standalone mode, can you enable core dumps
> so you can see the backtrace of the code that segfaults?  Knowing what
> specifically broke and how it got there is always a big help.
> Btw, keep in mind that there are memory-related bugs that don't show up
> until there's something big in memory that pushes the code in question
> up into a region with different data patterns in it (most frequently zero
> vs. non-zero, but others are possible).  IOW, maybe the code is dependent
> on uninitialized memory, but you were getting lucky when you ran it outside
> of Hadoop.  Have you run it through valgrind or Purify or similar?
> Greg

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message