hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Thread safety issues with JNI/native code from map tasks?
Date Fri, 28 Jan 2011 01:46:14 GMT
I am seeing very perplexing segfaults and standard allocation exceptions in my native code
(.so files passed to the distributed cace) which is called via JNI from the map task.  This
code runs perfectly fine (on the same data) outside Hadoop.  Even when run in a Hadoop standalone
mode (no cluster), it still segfaults.  The memory footprint is quite small and inspection
at run time reveals there is plenty of memory left, yet I get segfaults and exceptions.

I'm starting to wonder if this is a thread issue.

The native code is not *specifically* thread safe (not compiled with pthreads or anything
like that).

However, it is also not run in any concurrent fashion except w.r.t. to the JVM itself.  For
example, my map task doesn't make parallel calls through JNI to the native code on concurrent
threads at the Java level, nor does the native code itself spawn any threads (like I said,
it isn't even compiled with pthreads).

However, there are clearly other "threads" of execution.  For example, the JVM itself is running,
including whatever supplemental threads the JVM involves (the garbage collector?).  In addition,
my Java mapper is running two Java threads at the time of the native code.  One calls the
native code and effectively blocks until the native code returns through JNI.  The other just
spins and sends reports and statuses to the job tracker at regular intervals to prevent the
task from being killed, but it doesn't do anything else particularly memory-related, certainly
no JNI/native calls, it's very basic, just sleep 'n report, sleep 'n report.

So, the question is, in the scenario I have described, is there any reason to suspect that
the cause of my problems is some sort of thread trampling between the native code and something
else in the surrounding environment (the JVM or something like that), especially in the context
of the surrounding Hadoop infrastructure?  It doesn't really make any sense to me, but I'm
running out of ideas.

I've experimented with "mapred.child.java.opts" and "mapred.child.ulimit" but nothing really
seems to have any effect on the frequency of these errors.

I'm quite out of ideas.  These segfaults and standard allocation exceptions (in the face of
plenty of free memory) have basically brought my work to a halt and I just don't know what
to do anymore.


Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
  -- Homer Simpson

View raw message