Return-Path: Delivered-To: apmail-incubator-harmony-dev-archive@www.apache.org Received: (qmail 26535 invoked from network); 16 Nov 2006 11:03:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Nov 2006 11:03:40 -0000 Received: (qmail 45775 invoked by uid 500); 16 Nov 2006 11:03:47 -0000 Delivered-To: apmail-incubator-harmony-dev-archive@incubator.apache.org Received: (qmail 45739 invoked by uid 500); 16 Nov 2006 11:03:47 -0000 Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: harmony-dev@incubator.apache.org Delivered-To: mailing list harmony-dev@incubator.apache.org Received: (qmail 45730 invoked by uid 99); 16 Nov 2006 11:03:47 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Nov 2006 03:03:47 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of evgueni.brevnov@gmail.com designates 64.233.182.191 as permitted sender) Received: from [64.233.182.191] (HELO nf-out-0910.google.com) (64.233.182.191) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Nov 2006 03:03:36 -0800 Received: by nf-out-0910.google.com with SMTP id p46so1050932nfa for ; Thu, 16 Nov 2006 03:03:14 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YQa3Mjy3tEqOHDFsEKavjjWBU5Uv35qywmkDtJV34u8ZQH+tlGV+81zOp/Nn5smoriETFuskP/4E3gVY8LjOUgu31BCMM5LVquRafW1b3sdxyaQKTHxH5CkCQJoby3xzOAShIQ0Ru0gA3WRXIuFQXddT0itsFjty4Zf5h94q3Vs= Received: by 10.78.17.1 with SMTP id 1mr384361huq.1163674993965; Thu, 16 Nov 2006 03:03:13 -0800 (PST) Received: by 10.78.97.13 with HTTP; Thu, 16 Nov 2006 03:03:13 -0800 (PST) Message-ID: Date: Thu, 16 Nov 2006 17:03:13 +0600 From: "Evgueni Brevnov" To: harmony-dev@incubator.apache.org, geir@pobox.com Subject: Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200611110322.22083.gshimansky@gmail.com> <455A669C.1020407@pobox.com> <455B93B2.9070207@pobox.com> <455BF246.4000906@pobox.com> X-Virus-Checked: Checked by ClamAV on apache.org You can look at the change here http://issues.apache.org/jira/browse/HARMONY-2203 On 11/16/06, Evgueni Brevnov wrote: > I haven't published it yet...will file a JIRA soon... > > On 11/16/06, Geir Magnusson Jr. wrote: > > ah. whew. > > > > can you point me to that change you made? > > > > geir > > > > Evgueni Brevnov wrote: > > > I'm not aware if classlib uses SIGUSR2. In this particular case > > > classlib (to be more precise it is the portlib module) does sem_wait > > > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait" > > > with "while (hysem_wait() != 0) {}". It helped to pass all tests. > > > > > > Evgueni > > > > > > On 11/16/06, Geir Magnusson Jr. wrote: > > >> um... classlib uses SIGUSR2 as well? Doesn't our thread manager use it? > > >> > > >> Evgueni Brevnov wrote: > > >> > Hey, > > >> > > > >> > Seems like the pretty old problem shows itself again. I'm talking > > >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter > > >> > uses system semaphores for synchronization purposes...and hysem_wait > > >> > is interrupted by the signal: > > >> > > > >> > (gdb) p perror("sym_wait error:") > > >> > sym_wait error:: Interrupted system call > > >> > > > >> > Do we have good (universal) solution for such cases? > > >> > > > >> > Thanks > > >> > Evgueni > > >> > > > >> > On 11/15/06, Geir Magnusson Jr. wrote: > > >> >> > > >> >> > > >> >> Gregory Shimansky wrote: > > >> >> > Evgueni Brevnov wrote: > > >> >> >> hmmm.... strange. The patch was tested on multi-processor system > > >> >> >> running SUSE9. I will check if the patch misses something. > > >> Anyway, we > > >> >> >> need to wait with the patch submission until we 100% sure how > > >> >> >> hythread_monitor_init should behave. > > >> >> >> > > >> >> >> Thanks > > >> >> >> Evgueni > > >> >> >> > > >> >> >> On 11/11/06, Gregory Shimansky wrote: > > >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote: > > >> >> >>> > Hi, > > >> >> >>> > > > >> >> >>> > While investigating deadlock scenario which is described in > > >> >> >>> > HARMONY-2006 I found out one interesting thing. It turned out > > >> >> that DRL > > >> >> >>> > implementation of hythread_monitor_init / > > >> >> >>> > hythread_monitor_init_with_name initializes and acquires a > > >> monitor. > > >> >> >>> > Original spec reads: "Acquire and initialize a new monitor > > >> from the > > >> >> >>> > threading library...." AFAIU that doesn't mean to lock the > > >> >> monitor but > > >> >> >>> > get it from the threading library. So the hythread_monitor_init > > >> >> should > > >> >> >>> > not lock the monitor. > > >> >> >>> > > > >> >> >>> > Could somebody comment on that? > > >> >> >>> > > >> >> >>> It might be that semantic is different on different platforms > > >> >> which is > > >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly > > >> all of > > >> >> >>> acceptance tests on Linux while everything on Windows works (ok I > > >> >> >>> tested on > > >> >> >>> laptop with 1 processor while Linux was a HT server, sometimes > > >> it is > > >> >> >>> important for threading). > > >> >> > > > >> >> > I've tried to investigate the problem but didn't find the end of it > > >> >> yet. > > >> >> > The bug seems to be ubuntu specific (shall we maybe call this > > >> >> > distribution buggy and move on?). > > >> >> > > >> >> There is something odd about it, I'll admit... Remember the EOMEM > > >> bugs > > >> >> I found in forking? > > >> >> > > >> >> > > >> >> I didn't reproduce it on > > >> >> > gentoo, all tests work just fine. > > >> >> > > > >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, > > >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest, > > >> >> stress.WeakHashMapTest VM > > >> >> > segfaults. The stack looks like an infinite recursion of 4 stack > > >> >> frames: > > >> >> > > > >> >> > #0 0xb6dcb814 in null_java_reference_handler (signum=11, > > >> >> > info=0xb71a503c, context=0xb71a50bc) at > > >> >> > > > >> >> > > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > > >> >> > re/src/util/linux/signals_ia32.cpp:443 > > >> >> > #1 > > >> >> > #2 0xb6dcc20a in get_stack_addr () at > > >> >> > > > >> >> > > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > > >> >> > re/src/util/linux/signals_ia32.cpp:293 > > >> >> > #3 0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, > > >> uc=0xb71a54ec) > > >> >> > at > > >> >> > > > >> >> > > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > > >> >> > re/src/util/linux/signals_ia32.cpp:399 > > >> >> > #4 0xb6dcb900 in null_java_reference_handler (signum=11, > > >> >> > info=0xb71a546c, context=0xb71a54ec) at > > >> >> > > > >> >> > > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco > > >> >> > re/src/util/linux/signals_ia32.cpp:451 > > >> >> > > > >> >> > and so on. The stack is very long. When I run VM with > > >> -Xtrace:signals I > > >> >> > get a very long log of messages that "NPE or SOE detected at > > >> ...". The > > >> >> > first time address always varies, but it appears to be memcpy. > > >> The next > > >> >> > addresses are always the same, they point to get_stack_addr > > >> function. > > >> >> > > > >> >> > So I tried to find out why memcpy crashes in the first place. It > > >> >> appears > > >> >> > to be a struct copy called from jsig_handler hysig. The stack looks > > >> >> like > > >> >> > this (if I can trust gdb on ubuntu): > > >> >> > > > >> >> > #0 0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6 > > >> >> > #1 0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, > > >> uc=0x0) > > >> >> > at hysigunix.c:169 > > >> >> > #2 0xb7f9ec8b in asynchSignalReporter (userData=0x0) at > > >> hysignal.c:971 > > >> >> > #3 0xb7baa8ef in thread_start_proc (thd=0x807a8e8, > > >> p_args=0x807a8d8) > > >> >> > at > > >> >> > > > >> >> > > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712 > > >> > > >> >> > > >> >> > > > >> >> > #4 0xb7bb0ed4 in dummy_worker (opaque=0x0) at > > >> >> threadproc/unix/thread.c:138 > > >> >> > #5 0xb7b65341 in start_thread () from > > >> >> lib/tls/i686/cmov/libpthread.so.0 > > >> >> > #6 0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6 > > >> >> > > > >> >> > In jsig_handler a struct of type sigaction is copied > > >> >> > > > >> >> > act = saved_sigaction[sig]; > > >> >> > > > >> >> > and gcc replaces this statement with a call to memcpy it seems. > > >> But the > > >> >> > parameter sig is quite weird if you look at it. It is > > >> >> sig=-1215196204... > > >> >> > Now if I could only find where and this sig happened there... I > > >> cannot > > >> >> > find it in the depth of classlib native code this late at night. > > >> >> > > > >> >> > > >> >> > > >> > > > >> > > > > > >