harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?
Date Thu, 16 Nov 2006 05:08:22 GMT
ah. whew.

can you point me to that change you made?

geir

Evgueni Brevnov wrote:
> I'm not aware if classlib uses SIGUSR2. In this particular case
> classlib (to be more precise it is the portlib module) does sem_wait
> which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> 
> Evgueni
> 
> On 11/16/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
>>
>> Evgueni Brevnov wrote:
>> > Hey,
>> >
>> > Seems like the pretty old problem shows itself again. I'm talking
>> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
>> > uses system semaphores for synchronization purposes...and hysem_wait
>> > is interrupted by the signal:
>> >
>> > (gdb) p perror("sym_wait error:")
>> > sym_wait error:: Interrupted system call
>> >
>> > Do we have good (universal) solution for such cases?
>> >
>> > Thanks
>> > Evgueni
>> >
>> > On 11/15/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>> >>
>> >>
>> >> Gregory Shimansky wrote:
>> >> > Evgueni Brevnov wrote:
>> >> >> hmmm.... strange. The patch was tested on multi-processor system
>> >> >> running SUSE9. I will check if the patch misses something. 
>> Anyway, we
>> >> >> need to wait with the patch submission until we 100% sure how
>> >> >> hythread_monitor_init should behave.
>> >> >>
>> >> >> Thanks
>> >> >> Evgueni
>> >> >>
>> >> >> On 11/11/06, Gregory Shimansky <gshimansky@gmail.com> wrote:
>> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>> >> >>> > Hi,
>> >> >>> >
>> >> >>> > While investigating deadlock scenario which is described
in
>> >> >>> > HARMONY-2006 I found out one interesting thing. It turned
out
>> >> that DRL
>> >> >>> > implementation of hythread_monitor_init /
>> >> >>> > hythread_monitor_init_with_name initializes and acquires
a 
>> monitor.
>> >> >>> > Original spec reads: "Acquire and initialize a new monitor

>> from the
>> >> >>> > threading library...." AFAIU that doesn't mean to lock
the
>> >> monitor but
>> >> >>> > get it from the threading library. So the hythread_monitor_init
>> >> should
>> >> >>> > not lock the monitor.
>> >> >>> >
>> >> >>> > Could somebody comment on that?
>> >> >>>
>> >> >>> It might be that semantic is different on different platforms
>> >> which is
>> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly

>> all of
>> >> >>> acceptance tests on Linux while everything on Windows works
(ok I
>> >> >>> tested on
>> >> >>> laptop with 1 processor while Linux was a HT server, sometimes

>> it is
>> >> >>> important for threading).
>> >> >
>> >> > I've tried to investigate the problem but didn't find the end of it
>> >> yet.
>> >> > The bug seems to be ubuntu specific (<joke>shall we maybe call
this
>> >> > distribution buggy and move on?</joke>).
>> >>
>> >> There is something odd about it, I'll admit...  Remember the EOMEM 
>> bugs
>> >> I found in forking?
>> >>
>> >>
>> >> I didn't reproduce it on
>> >> > gentoo, all tests work just fine.
>> >> >
>> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
>> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
>> >> stress.WeakHashMapTest VM
>> >> > segfaults. The stack looks like an infinite recursion of 4 stack
>> >> frames:
>> >> >
>> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
>> >> > info=0xb71a503c, context=0xb71a50bc) at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:443
>> >> > #1  <signal handler called>
>> >> > #2  0xb6dcc20a in get_stack_addr () at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:293
>> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, 
>> uc=0xb71a54ec)
>> >> >     at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:399
>> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
>> >> > info=0xb71a546c, context=0xb71a54ec) at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:451
>> >> >
>> >> > and so on. The stack is very long. When I run VM with 
>> -Xtrace:signals I
>> >> > get a very long log of messages that "NPE or SOE detected at 
>> ...". The
>> >> > first time address always varies, but it appears to be memcpy. 
>> The next
>> >> > addresses are always the same, they point to get_stack_addr 
>> function.
>> >> >
>> >> > So I tried to find out why memcpy crashes in the first place. It
>> >> appears
>> >> > to be a struct copy called from jsig_handler hysig. The stack looks
>> >> like
>> >> > this (if I can trust gdb on ubuntu):
>> >> >
>> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
>> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, 
>> uc=0x0)
>> >> >  at hysigunix.c:169
>> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at 
>> hysignal.c:971
>> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, 
>> p_args=0x807a8d8)
>> >> >     at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712

>>
>> >>
>> >> >
>> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
>> >> threadproc/unix/thread.c:138
>> >> > #5  0xb7b65341 in start_thread () from
>> >> lib/tls/i686/cmov/libpthread.so.0
>> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
>> >> >
>> >> > In jsig_handler a struct of type sigaction is copied
>> >> >
>> >> > act = saved_sigaction[sig];
>> >> >
>> >> > and gcc replaces this statement with a call to memcpy it seems. 
>> But the
>> >> > parameter sig is quite weird if you look at it. It is
>> >> sig=-1215196204...
>> >> > Now if I could only find where and this sig happened there... I 
>> cannot
>> >> > find it in the depth of classlib native code this late at night.
>> >> >
>> >>
>> >>
>> >
>>
> 

Mime
View raw message