Return-Path: Delivered-To: apmail-gump-general-archive@www.apache.org Received: (qmail 15661 invoked from network); 11 Jul 2005 15:01:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Jul 2005 15:01:35 -0000 Received: (qmail 67189 invoked by uid 500); 11 Jul 2005 15:01:34 -0000 Delivered-To: apmail-gump-general-archive@gump.apache.org Received: (qmail 67131 invoked by uid 500); 11 Jul 2005 15:01:34 -0000 Mailing-List: contact general-help@gump.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Gump code and data" Reply-To: "Gump code and data" Delivered-To: mailing list general@gump.apache.org Received: (qmail 67118 invoked by uid 99); 11 Jul 2005 15:01:34 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2005 08:01:34 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: unknown (asf.osuosl.org: error in processing during lookup of mail@leosimons.com) Received: from [130.89.1.83] (HELO linuxupdserver.utsp.utwente.nl) (130.89.1.83) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2005 08:01:30 -0700 Received: from [130.89.169.128] (giraffe.student.utwente.nl [130.89.169.128]) by linuxupdserver.utsp.utwente.nl (8.11.7/HKD) with ESMTP id j6BF1Ob20244 for ; Mon, 11 Jul 2005 17:01:24 +0200 Message-ID: <42D289C4.8060204@leosimons.com> Date: Mon, 11 Jul 2005 17:01:24 +0200 From: Leo Simons User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050404) X-Accept-Language: en-us, en MIME-Version: 1.0 To: general@gump.apache.org Subject: Runtime.exec & Ant & Gump3 (was: Re: svn commit: r210128 - in /gump/branches/Gump3) References: <20050711134142.79633.qmail@minotaur.apache.org> In-Reply-To: <20050711134142.79633.qmail@minotaur.apache.org> X-Enigmail-Version: 0.90.2.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact helpdesk@ITBE.utwente.nl for more information. X-UTwente-MailScanner: Found to be clean X-UTwente-MailScanner-SpamCheck: spam, SpamAssassin (score=5.777, vereist 5, FVGT_S_MULTI_OBFU_2 0.10, J_CHICKENPOX_33 0.60, J_CHICKENPOX_34 0.60, J_CHICKENPOX_48 0.60, J_CHICKENPOX_54 0.60, J_CHICKENPOX_55 0.60, J_CHICKENPOX_56 0.60, J_CHICKENPOX_64 0.60, J_CHICKENPOX_66 0.60, J_CHICKENPOX_74 0.60, OACYS_SINGLE 0.10, RM_sl_LeadChar 0.10, TW_SV 0.08) X-UTwente-MailScanner-SpamScore: sssss X-MailScanner-From: mail@leosimons.com X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N So, ehm, I've had to learn a lot about process management in the last two days, most importantly that java is *really* bad at doing it properly. If you open pygump/python/gump/plugins/java/builder.py and edit the AntBuilderPlugin to have no_cleanup=False instead of no_cleanup=True, then do a gump run using the vmgump.xml profile, the run will usually stall trying to invoke java_cup. The reason for this seems to be that java sometimes deadlocks when forked from java. I built a "trivial" testcase (basically rewrote the Execute.java from ant to manually run my demo program and wrote a simple python wrapper to fire that up) and the problem does not occur there, so I suspect (after stepping through both python and java debuggers for a whole lot) that something like multi-threading or garbage collection is in some way significant. To be clear, this isn't a bug in gump or a bug in ant, but a bug in the JDK in interaction with a very specific environment. It'll be interesting to see if, for instance, the same mess doesn't occur when using Kaffe. I suspect that using any JVM for which Ant's Execute takes a different approach (ie not using Runtime.exec) makes the problem "go away". We'll have to see if this becomes a problem or not (eg zombie processes). I'll try hard to, if we run into issues, produce big and scary stack traces. My hunch is that we'd have to implement a work around in Ant...I doubt sun is going to fix their jdk... cheers, Leo leosimons@apache.org wrote: > * disable process group management for running ant. See inside > gump.plugins.java.builder.AntPlugin for some details. This was a *huge* > pain to figure out. What triggered this is the invocation of java_cup > from the xalan build.xml file, which has a > - > + > > > > @@ -252,6 +263,8 @@ > ... > + def _do_run_command(self, command, args, workdir, shell=False, no_cleanup=False): > + # see gump.plugins.java.builder.AntPlugin for information on the > + # no_cleanup flag > + ... > - cmd = Popen(myargs,shell=False,cwd=workdir,stdout=outputfile,stderr=STDOUT,env=command.env) > + cmd = Popen(myargs,shell=False,cwd=workdir,stdout=outputfile,stderr=STDOUT,env=command.env, no_cleanup=no_cleanup) ... > - command.build_log = outputfile.read() > + # we need to avoid Unicode errors when people put in 'fancy characters' > + # into build outputs > + command.build_log = unicode(outputfile.read(), 'iso-8859-1') > + import tempfile ... > - def _get_new_process_group(): > - """Get us an unused (or so we hope) process group.""" > - pid = os.fork() > - gid = pid # that *should* be correct. However, let's actually > - # create something in that group. > - if pid == 0: > - # Child > - > - # ensure a process group is created > - os.setpgrp() > - > - # sleep for ten days to keep the process group around > - # for "a while" > - import time > - time.sleep(10*24*60*60) > - os._exit(0) > - else: > - # Parent > - > - # wait for child a little so it can set its group > - import time > - time.sleep(1) > - > - # get the gid for the child > - gid = os.getpgid(pid) > - > - return gid > - > - # This is the group we chuck our children in. We don't just want to > - # use our own group since we don't want to kill ourselves prematurely! > - _our_process_group = _get_new_process_group() > + temp_dir = tempfile.mkdtemp("gump_util_executor") > + process_list_filename = os.path.join(temp_dir, "processlist.pids") > > + def savepgid(filename): > + """Function called from Popen child process to create new process groups.""" > + os.setpgrp() > + f = None > + try: > + grp = os.getpgrp() > + f = open(filename,'a+') > + f.write("%d" % grp) > + f.write('\n') > + finally: > + if f: > + try: f.close() > + except: pass > + > class Popen(subprocess.Popen): > """This is a thin wrapper around subprocess.Popen which handles > process group management. The gump.util.executor.clean_up_processes() > @@ -106,35 +109,67 @@ > stdin=None, stdout=None, stderr=None, > preexec_fn=None, close_fds=False, shell=False, > cwd=None, env=None, universal_newlines=False, > - startupinfo=None, creationflags=0): > - """Create a new Popen instance that delegates to the > - subprocess Popen.""" > - if not preexec_fn: > - # setpgid to the gump process group inside the child > - pre_exec_function = lambda: os.setpgid(0, _our_process_group) > - else: > - # The below has a "stupid lambda trick" that makes the lambda > - # evaluate a tuple of functions. This sticks our own function > - # call in there while still supporting the originally provided > - # function > - pre_exec_function = lambda: (preexec_fn(),os.setpgid(0, _our_process_group)) > - > + startupinfo=None, creationflags=0, no_cleanup=False): > + # see gump.plugins.java.builder.AntPlugin for information on the > + # no_cleanup flag > + > # a logger can be set for this module to make us log commands > if _log: > _log.info(" Executing command:\n %s'%s'%s\n in directory '%s'" % (ansicolor.Blue, " ".join(args), ansicolor.Black, os.path.abspath(cwd or os.curdir))) > - > - subprocess.Popen.__init__(self, args, bufsize=bufsize, executable=executable, > - stdin=stdin, stdout=stdout, stderr=stderr, > - # note our custom function in there... > - preexec_fn=pre_exec_function, close_fds=close_fds, shell=shell, > - cwd=cwd, env=env, universal_newlines=universal_newlines, > - startupinfo=startupinfo, creationflags=creationflags) > + > + if not no_cleanup: > + global process_list_filename > + """Create a new Popen instance that delegates to the > + subprocess Popen.""" > + if not preexec_fn: > + # setpgid to the gump process group inside the child > + pre_exec_function = lambda: savepgid(process_list_filename) > + else: > + # The below has a "stupid lambda trick" that makes the lambda > + # evaluate a tuple of functions. This sticks our own function > + # call in there while still supporting the originally provided > + # function > + pre_exec_function = lambda: (preexec_fn(),savepgid(process_list_filename)) > + > + > + subprocess.Popen.__init__(self, args, bufsize=bufsize, executable=executable, > + stdin=stdin, stdout=stdout, stderr=stderr, > + # note our custom function in there... > + preexec_fn=pre_exec_function, close_fds=close_fds, shell=shell, > + cwd=cwd, env=env, universal_newlines=universal_newlines, > + startupinfo=startupinfo, creationflags=creationflags) > + else: > + subprocess.Popen.__init__(self, args, bufsize=bufsize, executable=executable, > + stdin=stdin, stdout=stdout, stderr=stderr, > + # note our custom function is *not* in there... > + preexec_fn=preexec_fn, close_fds=close_fds, shell=shell, > + cwd=cwd, env=env, universal_newlines=universal_newlines, > + startupinfo=startupinfo, creationflags=creationflags) > + > > def clean_up_processes(timeout=300): > """This function can be called prior to program exit to attempt to > kill all our running children that were created using this module.""" > > - pgrp_list = [_our_process_group] > + global process_list_filename > + global temp_dir > + > + pgrp_list = [] > + > + f = None > + try: > + f = open(process_list_filename, 'r') > + pgrp_list = [int(line) for line in f.read().splitlines()] > + except: > + if f: > + try: f.close() > + except: pass > + try: > + import shutil > + shutil.rmtree(temp_dir) > + except: > + pass > + --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@gump.apache.org For additional commands, e-mail: general-help@gump.apache.org