Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 76126 invoked from network); 13 Mar 2007 13:18:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2007 13:18:15 -0000 Received: (qmail 8008 invoked by uid 500); 13 Mar 2007 13:18:21 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 7985 invoked by uid 500); 13 Mar 2007 13:18:21 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 7976 invoked by uid 99); 13 Mar 2007 13:18:20 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 06:18:20 -0700 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of gcjhd-harmony-dev@m.gmane.org designates 80.91.229.2 as permitted sender) Received: from [80.91.229.2] (HELO ciao.gmane.org) (80.91.229.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 06:18:07 -0700 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1HR6sd-0000Cx-1R for dev@harmony.apache.org; Tue, 13 Mar 2007 14:17:32 +0100 Received: from 213.33.189.210 ([213.33.189.210]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Mar 2007 14:17:31 +0100 Received: from egor.pasko by 213.33.189.210 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Mar 2007 14:17:31 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: dev@harmony.apache.org From: Egor Pasko Subject: Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed? Date: 13 Mar 2007 16:16:25 +0300 Lines: 72 Message-ID: <0vqabyhp9li.fsf@gmail.com> References: <4dd1f3f00703120905q3651b5e8g33dda1a687b9b5cc@mail.gmail.com> <0vqodmypfzl.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 213.33.189.210 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 Sender: news X-Virus-Checked: Checked by ClamAV on apache.org On the 0x298 day of Apache Harmony Gregory Shimansky wrote: > Egor Pasko wrote: > > On the 0x297 day of Apache Harmony Weldon Washburn wrote: > >> All, > >> I assigned H3010 to myself. This test definitely demonstrates a bug that > >> needs fixing. But its not clear when this bug must be fixed. This really > >> brings forward a higher-level. What to code this bug right now and when > >> would this bug be moved to "blocker" status? I provide some observations to > >> start the discussion: > >> > >> 1) > >> The bug is a Stack Overflow Exception happens from inside fast native helper > >> functions. Fast native helpers do not setup the M2N stack frame which is > >> required to throw exceptions such as SOE. Adding M2N setup to fast native > >> helper will unacceptably slow down the system. > > to be honest.. > > SOE can happen from a 'push' onto stack (such pushes are not > > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N > > necessary for releasing the lock). > > Do you think it is a low probability? > > If SOE happens in managed code it is handled similar to hardware NPEs, > that is the stack is unwound to the exception handler if it exists or > to the nearest native frame. AFAIK hardware NPE places are not > safepoints too, so there is a small possibility for enumeration bugs > in such places. yes, you are right. Still hardware NPEs are disabled for a) try..catch regions b) synchronized methods (VM needs jit_get_address_of_this) Okay, I should agree that both (a) and (b) are not likely to hit SOE on "popular workloads". And, AFAIR, we have both bugs in JIRA, so, all is going fine. Let's think of them as having low probability. > >> 2) > >> When running useful workload, a Stack Overflow that hits precisely on a fast > >> native has a very low probability. Note the test in H3010 specifically > >> forces this event to happen with a very high probability. In other words, > >> while the test is a good, it reflects a very rare event in nature. > >> > >> Given the above, how about we address fixing the problem in two stages: > >> > >> 1) > >> First stage: add an "assert(zero);" to the exception handler when it is > >> determined an SOE has happened inside a fast native. This way, we will find > >> out quickly when an important workload hits this bug. Once the assert(zero) > >> is added, we code H3010 as "later" > >> > >> 2) > >> Second stage: When an application we care about hits the assert(zero), we > >> recode H3010 as "major/blocker". > >> > >> 3) > >> While waiting for #2 above to happen, we discuss on harmony-dev ways of > >> designing the right fix. For starts, I think we should investigate a > >> design where the exception handler rewrites the entire register context so > >> that returning from exception handler revectors the instruction pointer to > >> recovery code that will somehow push the M2N frame on the stack and call > >> proper SOE throwing code. I have not looked closely at how to do this. I > >> am not convinced this approach will work. However, I do think its worth a > >> try. Thoughts? > > > > > -- > Gregory > > -- Egor Pasko