From Greg Roelofs <roel...@yahoo-inc.com>
Subject Re: Thread safety issues with JNI/native code from map tasks?
Date Sat, 29 Jan 2011 02:43:33 GMT
I wrote:

>> Btw, keep in mind that there are memory-related bugs that don't show up
>> until there's something big in memory that pushes the code in question
>> up into a region with different data patterns in it (most frequently zero
>> vs. non-zero, but others are possible).  IOW, maybe the code is dependent
>> on uninitialized memory, but you were getting lucky when you ran it outside
>> of Hadoop.  Have you run it through valgrind or Purify or similar?

Keith Wiley wrote:

> Valgrind has turned out to be almost useless.  It can't "reach"
> through the JVM through JNI to the .so code.  If I don't
> tell valgrind to following children, it obviously produces
> no relevant output, but if I do tell it to follow children,
> it can't successfully launch a VM to run Java in:

> Error occurred during initialization of VM
> Unknown x64 processor: SSE2 not supported

> Sigh...any thoughts on running Valgrind on Hadoop->JVM->JNI->native code?

I actually meant something simpler:  if we posit that the bug is actually
in the library code but isn't always triggering a segfault due to random
memory conditions (i.e., "getting lucky"), then running valgrind on it in
a non-Java context (i.e., what you said "runs perfectly fine outside Hadoop")
should detect such bug(s).

If that shows nothing, and you're not passing buffers across the JNI boundary
(=> possible GC issues, perhaps subtle ones?), then I'm out of ideas.  Again.
Sorry. :-/


