harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov" <evgueni.brev...@gmail.com>
Subject Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?
Date Wed, 15 Nov 2006 09:35:13 GMT
Hey,

Seems like the pretty old problem shows itself again. I'm talking
about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
uses system semaphores for synchronization purposes...and hysem_wait
is interrupted by the signal:

(gdb) p perror("sym_wait error:")
sym_wait error:: Interrupted system call

Do we have good (universal) solution for such cases?

Thanks
Evgueni

On 11/15/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>
>
> Gregory Shimansky wrote:
> > Evgueni Brevnov wrote:
> >> hmmm.... strange. The patch was tested on multi-processor system
> >> running SUSE9. I will check if the patch misses something. Anyway, we
> >> need to wait with the patch submission until we 100% sure how
> >> hythread_monitor_init should behave.
> >>
> >> Thanks
> >> Evgueni
> >>
> >> On 11/11/06, Gregory Shimansky <gshimansky@gmail.com> wrote:
> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> >>> > Hi,
> >>> >
> >>> > While investigating deadlock scenario which is described in
> >>> > HARMONY-2006 I found out one interesting thing. It turned out that
DRL
> >>> > implementation of hythread_monitor_init /
> >>> > hythread_monitor_init_with_name initializes and acquires a monitor.
> >>> > Original spec reads: "Acquire and initialize a new monitor from the
> >>> > threading library...." AFAIU that doesn't mean to lock the monitor
but
> >>> > get it from the threading library. So the hythread_monitor_init should
> >>> > not lock the monitor.
> >>> >
> >>> > Could somebody comment on that?
> >>>
> >>> It might be that semantic is different on different platforms which is
> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
> >>> acceptance tests on Linux while everything on Windows works (ok I
> >>> tested on
> >>> laptop with 1 processor while Linux was a HT server, sometimes it is
> >>> important for threading).
> >
> > I've tried to investigate the problem but didn't find the end of it yet.
> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
> > distribution buggy and move on?</joke>).
>
> There is something odd about it, I'll admit...  Remember the EOMEM bugs
> I found in forking?
>
>
> I didn't reproduce it on
> > gentoo, all tests work just fine.
> >
> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> > gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM
> > segfaults. The stack looks like an infinite recursion of 4 stack frames:
> >
> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> > info=0xb71a503c, context=0xb71a50bc) at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:443
> > #1  <signal handler called>
> > #2  0xb6dcc20a in get_stack_addr () at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:293
> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
> >     at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:399
> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> > info=0xb71a546c, context=0xb71a54ec) at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:451
> >
> > and so on. The stack is very long. When I run VM with -Xtrace:signals I
> > get a very long log of messages that "NPE or SOE detected at ...". The
> > first time address always varies, but it appears to be memcpy. The next
> > addresses are always the same, they point to get_stack_addr function.
> >
> > So I tried to find out why memcpy crashes in the first place. It appears
> > to be a struct copy called from jsig_handler hysig. The stack looks like
> > this (if I can trust gdb on ubuntu):
> >
> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0)
> >  at hysigunix.c:169
> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
> >     at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> >
> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
> > #5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> >
> > In jsig_handler a struct of type sigaction is copied
> >
> > act = saved_sigaction[sig];
> >
> > and gcc replaces this statement with a call to memcpy it seems. But the
> > parameter sig is quite weird if you look at it. It is sig=-1215196204...
> > Now if I could only find where and this sig happened there... I cannot
> > find it in the depth of classlib native code this late at night.
> >
>
>

Mime
View raw message