harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Shimansky <gshiman...@gmail.com>
Subject Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?
Date Mon, 13 Nov 2006 23:56:19 GMT
Evgueni Brevnov wrote:
> hmmm.... strange. The patch was tested on multi-processor system
> running SUSE9. I will check if the patch misses something. Anyway, we
> need to wait with the patch submission until we 100% sure how
> hythread_monitor_init should behave.
> 
> Thanks
> Evgueni
> 
> On 11/11/06, Gregory Shimansky <gshimansky@gmail.com> wrote:
>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>> > Hi,
>> >
>> > While investigating deadlock scenario which is described in
>> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
>> > implementation of hythread_monitor_init /
>> > hythread_monitor_init_with_name initializes and acquires a monitor.
>> > Original spec reads: "Acquire and initialize a new monitor from the
>> > threading library...." AFAIU that doesn't mean to lock the monitor but
>> > get it from the threading library. So the hythread_monitor_init should
>> > not lock the monitor.
>> >
>> > Could somebody comment on that?
>>
>> It might be that semantic is different on different platforms which is
>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
>> acceptance tests on Linux while everything on Windows works (ok I 
>> tested on
>> laptop with 1 processor while Linux was a HT server, sometimes it is
>> important for threading).

I've tried to investigate the problem but didn't find the end of it yet. 
The bug seems to be ubuntu specific (<joke>shall we maybe call this 
distribution buggy and move on?</joke>). I didn't reproduce it on 
gentoo, all tests work just fine.

The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, 
gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM 
segfaults. The stack looks like an infinite recursion of 4 stack frames:

#0  0xb6dcb814 in null_java_reference_handler (signum=11, 
info=0xb71a503c, context=0xb71a50bc) at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:443
#1  <signal handler called>
#2  0xb6dcc20a in get_stack_addr () at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:293
#3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
     at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:399
#4  0xb6dcb900 in null_java_reference_handler (signum=11, 
info=0xb71a546c, context=0xb71a54ec) at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:451

and so on. The stack is very long. When I run VM with -Xtrace:signals I 
get a very long log of messages that "NPE or SOE detected at ...". The 
first time address always varies, but it appears to be memcpy. The next 
addresses are always the same, they point to get_stack_addr function.

So I tried to find out why memcpy crashes in the first place. It appears 
to be a struct copy called from jsig_handler hysig. The stack looks like 
this (if I can trust gdb on ubuntu):

#0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0) 
  at hysigunix.c:169
#2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
#3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
     at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
#4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
#5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
#6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6

In jsig_handler a struct of type sigaction is copied

act = saved_sigaction[sig];

and gcc replaces this statement with a call to memcpy it seems. But the 
parameter sig is quite weird if you look at it. It is sig=-1215196204... 
Now if I could only find where and this sig happened there... I cannot 
find it in the depth of classlib native code this late at night.

-- 
Gregory


Mime
View raw message