harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov" <evgueni.brev...@gmail.com>
Subject Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?
Date Thu, 16 Nov 2006 11:03:13 GMT
You can look at the change here
http://issues.apache.org/jira/browse/HARMONY-2203

On 11/16/06, Evgueni Brevnov <evgueni.brevnov@gmail.com> wrote:
> I haven't published it yet...will file a JIRA soon...
>
> On 11/16/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> > ah. whew.
> >
> > can you point me to that change you made?
> >
> > geir
> >
> > Evgueni Brevnov wrote:
> > > I'm not aware if classlib uses SIGUSR2. In this particular case
> > > classlib (to be more precise it is the portlib module) does sem_wait
> > > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> > > with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> > >
> > > Evgueni
> > >
> > > On 11/16/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> > >> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
> > >>
> > >> Evgueni Brevnov wrote:
> > >> > Hey,
> > >> >
> > >> > Seems like the pretty old problem shows itself again. I'm talking
> > >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
> > >> > uses system semaphores for synchronization purposes...and hysem_wait
> > >> > is interrupted by the signal:
> > >> >
> > >> > (gdb) p perror("sym_wait error:")
> > >> > sym_wait error:: Interrupted system call
> > >> >
> > >> > Do we have good (universal) solution for such cases?
> > >> >
> > >> > Thanks
> > >> > Evgueni
> > >> >
> > >> > On 11/15/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
> > >> >>
> > >> >>
> > >> >> Gregory Shimansky wrote:
> > >> >> > Evgueni Brevnov wrote:
> > >> >> >> hmmm.... strange. The patch was tested on multi-processor
system
> > >> >> >> running SUSE9. I will check if the patch misses something.
> > >> Anyway, we
> > >> >> >> need to wait with the patch submission until we 100%
sure how
> > >> >> >> hythread_monitor_init should behave.
> > >> >> >>
> > >> >> >> Thanks
> > >> >> >> Evgueni
> > >> >> >>
> > >> >> >> On 11/11/06, Gregory Shimansky <gshimansky@gmail.com>
wrote:
> > >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov
wrote:
> > >> >> >>> > Hi,
> > >> >> >>> >
> > >> >> >>> > While investigating deadlock scenario which
is described in
> > >> >> >>> > HARMONY-2006 I found out one interesting thing.
It turned out
> > >> >> that DRL
> > >> >> >>> > implementation of hythread_monitor_init /
> > >> >> >>> > hythread_monitor_init_with_name initializes
and acquires a
> > >> monitor.
> > >> >> >>> > Original spec reads: "Acquire and initialize
a new monitor
> > >> from the
> > >> >> >>> > threading library...." AFAIU that doesn't mean
to lock the
> > >> >> monitor but
> > >> >> >>> > get it from the threading library. So the hythread_monitor_init
> > >> >> should
> > >> >> >>> > not lock the monitor.
> > >> >> >>> >
> > >> >> >>> > Could somebody comment on that?
> > >> >> >>>
> > >> >> >>> It might be that semantic is different on different
platforms
> > >> >> which is
> > >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks
nearly
> > >> all of
> > >> >> >>> acceptance tests on Linux while everything on Windows
works (ok I
> > >> >> >>> tested on
> > >> >> >>> laptop with 1 processor while Linux was a HT server,
sometimes
> > >> it is
> > >> >> >>> important for threading).
> > >> >> >
> > >> >> > I've tried to investigate the problem but didn't find the
end of it
> > >> >> yet.
> > >> >> > The bug seems to be ubuntu specific (<joke>shall we
maybe call this
> > >> >> > distribution buggy and move on?</joke>).
> > >> >>
> > >> >> There is something odd about it, I'll admit...  Remember the EOMEM
> > >> bugs
> > >> >> I found in forking?
> > >> >>
> > >> >>
> > >> >> I didn't reproduce it on
> > >> >> > gentoo, all tests work just fine.
> > >> >> >
> > >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List,
gc.NPE,
> > >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
> > >> >> stress.WeakHashMapTest VM
> > >> >> > segfaults. The stack looks like an infinite recursion of
4 stack
> > >> >> frames:
> > >> >> >
> > >> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> > >> >> > info=0xb71a503c, context=0xb71a50bc) at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:443
> > >> >> > #1  <signal handler called>
> > >> >> > #2  0xb6dcc20a in get_stack_addr () at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:293
> > >> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c,
> > >> uc=0xb71a54ec)
> > >> >> >     at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:399
> > >> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> > >> >> > info=0xb71a546c, context=0xb71a54ec) at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:451
> > >> >> >
> > >> >> > and so on. The stack is very long. When I run VM with
> > >> -Xtrace:signals I
> > >> >> > get a very long log of messages that "NPE or SOE detected
at
> > >> ...". The
> > >> >> > first time address always varies, but it appears to be memcpy.
> > >> The next
> > >> >> > addresses are always the same, they point to get_stack_addr
> > >> function.
> > >> >> >
> > >> >> > So I tried to find out why memcpy crashes in the first place.
It
> > >> >> appears
> > >> >> > to be a struct copy called from jsig_handler hysig. The stack
looks
> > >> >> like
> > >> >> > this (if I can trust gdb on ubuntu):
> > >> >> >
> > >> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> > >> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0,
> > >> uc=0x0)
> > >> >> >  at hysigunix.c:169
> > >> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at
> > >> hysignal.c:971
> > >> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8,
> > >> p_args=0x807a8d8)
> > >> >> >     at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> > >>
> > >> >>
> > >> >> >
> > >> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
> > >> >> threadproc/unix/thread.c:138
> > >> >> > #5  0xb7b65341 in start_thread () from
> > >> >> lib/tls/i686/cmov/libpthread.so.0
> > >> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> > >> >> >
> > >> >> > In jsig_handler a struct of type sigaction is copied
> > >> >> >
> > >> >> > act = saved_sigaction[sig];
> > >> >> >
> > >> >> > and gcc replaces this statement with a call to memcpy it
seems.
> > >> But the
> > >> >> > parameter sig is quite weird if you look at it. It is
> > >> >> sig=-1215196204...
> > >> >> > Now if I could only find where and this sig happened there...
I
> > >> cannot
> > >> >> > find it in the depth of classlib native code this late at
night.
> > >> >> >
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> >
>

Mime
View raw message