harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov" <evgueni.brev...@gmail.com>
Subject Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?
Date Thu, 16 Nov 2006 05:42:08 GMT
I haven't published it yet...will file a JIRA soon...

On 11/16/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> ah. whew.
>
> can you point me to that change you made?
>
> geir
>
> Evgueni Brevnov wrote:
> > I'm not aware if classlib uses SIGUSR2. In this particular case
> > classlib (to be more precise it is the portlib module) does sem_wait
> > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> > with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> >
> > Evgueni
> >
> > On 11/16/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> >> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
> >>
> >> Evgueni Brevnov wrote:
> >> > Hey,
> >> >
> >> > Seems like the pretty old problem shows itself again. I'm talking
> >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
> >> > uses system semaphores for synchronization purposes...and hysem_wait
> >> > is interrupted by the signal:
> >> >
> >> > (gdb) p perror("sym_wait error:")
> >> > sym_wait error:: Interrupted system call
> >> >
> >> > Do we have good (universal) solution for such cases?
> >> >
> >> > Thanks
> >> > Evgueni
> >> >
> >> > On 11/15/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> >> >>
> >> >>
> >> >> Gregory Shimansky wrote:
> >> >> > Evgueni Brevnov wrote:
> >> >> >> hmmm.... strange. The patch was tested on multi-processor
system
> >> >> >> running SUSE9. I will check if the patch misses something.
> >> Anyway, we
> >> >> >> need to wait with the patch submission until we 100% sure
how
> >> >> >> hythread_monitor_init should behave.
> >> >> >>
> >> >> >> Thanks
> >> >> >> Evgueni
> >> >> >>
> >> >> >> On 11/11/06, Gregory Shimansky <gshimansky@gmail.com>
wrote:
> >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> >> >> >>> > Hi,
> >> >> >>> >
> >> >> >>> > While investigating deadlock scenario which is described
in
> >> >> >>> > HARMONY-2006 I found out one interesting thing. It
turned out
> >> >> that DRL
> >> >> >>> > implementation of hythread_monitor_init /
> >> >> >>> > hythread_monitor_init_with_name initializes and acquires
a
> >> monitor.
> >> >> >>> > Original spec reads: "Acquire and initialize a new
monitor
> >> from the
> >> >> >>> > threading library...." AFAIU that doesn't mean to
lock the
> >> >> monitor but
> >> >> >>> > get it from the threading library. So the hythread_monitor_init
> >> >> should
> >> >> >>> > not lock the monitor.
> >> >> >>> >
> >> >> >>> > Could somebody comment on that?
> >> >> >>>
> >> >> >>> It might be that semantic is different on different platforms
> >> >> which is
> >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks
nearly
> >> all of
> >> >> >>> acceptance tests on Linux while everything on Windows
works (ok I
> >> >> >>> tested on
> >> >> >>> laptop with 1 processor while Linux was a HT server, sometimes
> >> it is
> >> >> >>> important for threading).
> >> >> >
> >> >> > I've tried to investigate the problem but didn't find the end
of it
> >> >> yet.
> >> >> > The bug seems to be ubuntu specific (<joke>shall we maybe
call this
> >> >> > distribution buggy and move on?</joke>).
> >> >>
> >> >> There is something odd about it, I'll admit...  Remember the EOMEM
> >> bugs
> >> >> I found in forking?
> >> >>
> >> >>
> >> >> I didn't reproduce it on
> >> >> > gentoo, all tests work just fine.
> >> >> >
> >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
> >> >> stress.WeakHashMapTest VM
> >> >> > segfaults. The stack looks like an infinite recursion of 4 stack
> >> >> frames:
> >> >> >
> >> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> >> >> > info=0xb71a503c, context=0xb71a50bc) at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:443
> >> >> > #1  <signal handler called>
> >> >> > #2  0xb6dcc20a in get_stack_addr () at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:293
> >> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c,
> >> uc=0xb71a54ec)
> >> >> >     at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:399
> >> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> >> >> > info=0xb71a546c, context=0xb71a54ec) at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:451
> >> >> >
> >> >> > and so on. The stack is very long. When I run VM with
> >> -Xtrace:signals I
> >> >> > get a very long log of messages that "NPE or SOE detected at
> >> ...". The
> >> >> > first time address always varies, but it appears to be memcpy.
> >> The next
> >> >> > addresses are always the same, they point to get_stack_addr
> >> function.
> >> >> >
> >> >> > So I tried to find out why memcpy crashes in the first place.
It
> >> >> appears
> >> >> > to be a struct copy called from jsig_handler hysig. The stack
looks
> >> >> like
> >> >> > this (if I can trust gdb on ubuntu):
> >> >> >
> >> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> >> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0,
> >> uc=0x0)
> >> >> >  at hysigunix.c:169
> >> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at
> >> hysignal.c:971
> >> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8,
> >> p_args=0x807a8d8)
> >> >> >     at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> >>
> >> >>
> >> >> >
> >> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
> >> >> threadproc/unix/thread.c:138
> >> >> > #5  0xb7b65341 in start_thread () from
> >> >> lib/tls/i686/cmov/libpthread.so.0
> >> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> >> >> >
> >> >> > In jsig_handler a struct of type sigaction is copied
> >> >> >
> >> >> > act = saved_sigaction[sig];
> >> >> >
> >> >> > and gcc replaces this statement with a call to memcpy it seems.
> >> But the
> >> >> > parameter sig is quite weird if you look at it. It is
> >> >> sig=-1215196204...
> >> >> > Now if I could only find where and this sig happened there...
I
> >> cannot
> >> >> > find it in the depth of classlib native code this late at night.
> >> >> >
> >> >>
> >> >>
> >> >
> >>
> >
>

Mime
View raw message