From olio-user-return-304-apmail-incubator-olio-user-archive=incubator.apache.org@incubator.apache.org Tue Feb 09 15:53:29 2010 Return-Path: Delivered-To: apmail-incubator-olio-user-archive@minotaur.apache.org Received: (qmail 72297 invoked from network); 9 Feb 2010 15:53:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2010 15:53:29 -0000 Received: (qmail 16592 invoked by uid 500); 9 Feb 2010 15:53:29 -0000 Delivered-To: apmail-incubator-olio-user-archive@incubator.apache.org Received: (qmail 16537 invoked by uid 500); 9 Feb 2010 15:53:29 -0000 Mailing-List: contact olio-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: olio-user@incubator.apache.org Delivered-To: mailing list olio-user@incubator.apache.org Received: (qmail 16528 invoked by uid 99); 9 Feb 2010 15:53:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 15:53:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jpschnee@gmail.com designates 209.85.220.212 as permitted sender) Received: from [209.85.220.212] (HELO mail-fx0-f212.google.com) (209.85.220.212) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 15:53:21 +0000 Received: by fxm4 with SMTP id 4so1187387fxm.20 for ; Tue, 09 Feb 2010 07:53:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type; bh=e4qR7f+eXX1ihsDKvVDrhMEkQmBQD60knJcYlGoZrjs=; b=fFDf8XAtbiWE4mPxdNwO6PNMO5Xi+Gb3RdKsJaZmxWhcF1+m49Efrn3H+I/omyAXHS TH3vcHOMx+8dIbGfTt5CIAACi4EU/NT3uWreDmStXMdepV00WE9aOhrOozbzjZo6DZ9n pbbYq32sAcKHFduWL+iHj6aMuitdVpGWwnz9E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=N/cR8D3M3aJA5vucSkuUVeUCsj4o6NA7eSslNMDZKkyXEw3Z4wqWkRmg2kZ0m+ljE0 /C/kfCZ469BN+3NEQOkUvp1qj+kSkncjJMpUR93oUTlnpiwL5Xa6EE1jJaqHPe5MPyYB qWdZU2VvMhKTpFZLJtAJeA9ysTpDCPPpJyJjw= MIME-Version: 1.0 Received: by 10.239.185.132 with SMTP id c4mr908878hbh.20.1265730780871; Tue, 09 Feb 2010 07:53:00 -0800 (PST) In-Reply-To: <8207aa2d1002020953ra0b2a7cx9db4d44571a380b2@mail.gmail.com> References: <8207aa2d1002010932k4cf9f47bub8380456aea7971a@mail.gmail.com> <4B671AD2.4060001@sun.com> <8207aa2d1002011050x6c66a91bhfb581c88e8bd558a@mail.gmail.com> <4B6727AD.8090006@sun.com> <8207aa2d1002011146i23e7640ck25919d21136b0650@mail.gmail.com> <8207aa2d1002011545s3b9a7716jf9279257efcce225@mail.gmail.com> <8207aa2d1002011559v7937f9fer615e82ce7c35d897@mail.gmail.com> <4B68531F.4000706@sun.com> <8207aa2d1002020953ra0b2a7cx9db4d44571a380b2@mail.gmail.com> From: Joshua Schnee Date: Tue, 9 Feb 2010 09:52:40 -0600 Message-ID: <8207aa2d1002090752s7fc85fd9xa6938f8f4a7111a1@mail.gmail.com> Subject: Re: OLIO-Java harness runs forever - round 2 To: Akara Sucharitakul Cc: olio-user@incubator.apache.org Content-Type: multipart/alternative; boundary=001485f4238a933b91047f2ce951 X-Virus-Checked: Checked by ClamAV on apache.org --001485f4238a933b91047f2ce951 Content-Type: text/plain; charset=ISO-8859-1 Akara, We are still experiencing this issue, across several different physical clients (3 now). I've asked one of our lead Java developers to review the stack traces and although he agrees with your assessment of where the harness, master and driver are all hung waiting for the driver to terminate. He said he was somewhat confused by the claim that Faban will time out an attempted read after 30 seconds. He believes this to be a blocking read call, one that will wait forever for data to be returned unless it is interrupted somehow. He suggested putting in a 30-second read timeout on the threads at the end of the rampdown to stop the harness from running forever, but was also concerned about the run never reaching the end of the rampdown period. Does Faban still do the rampdown phase if a run is being aborted? Also, could you possibly tell me what block of code in Faban assures the timeout constraint? He thought he might be able to work backwards from there... Thanks, Joshua On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee wrote: > Akara, > > We don't know of anything that is specific to our configuration, but one > thought is that it has something to do with our running it under a Windows > client. We'll dig a bit deeper, but if you have any ideas, they would also > be appreciated. > > Thanks, > Joshua > > > On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul < > Akara.Sucharitakul@sun.com> wrote: > >> Josh, >> >> I've checked the stacks and it turns out all driver threads are waiting on >> socket read (reading from the server). Faban has a default read timeout of >> 30 seconds and interrupts the thread and therefore the I/O at 2 minutes >> after the rampdown has ended. So it should not run forever. >> >> Is there anything in your configuration that causes the socket not to >> timeout according to contract or be non-interruptible? Otherwise I couldn't >> see how it could run forever. >> >> Here are some details: >> >> 1. pid 608 seems to be an old stale instance of the harness that does not >> want to terminate. Please kill it off. >> 2. pid 2504 is a driver agent. All threads are waiting on socket read. >> 3. pid 4508 is the master. It is waiting for the driver agent to finish up >> the threads (in a join). >> 4. pid 4080 is the harness. It is waiting for the master to terminate. >> >> >> Thanks, >> -Akara >> >> Joshua Schnee wrote: >> >>> Turns out I didn't get as much info as what is probably needed. Here's >>> an updated zip file with more dumps... >>> >>> Thanks, >>> >>> >>> On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee >> jpschnee@gmail.com>> wrote: >>> >>> Just realized the list didn't get copied. >>> >>> >>> ---------- Forwarded message ---------- >>> From: *Joshua Schnee* >> >> >>> Date: Mon, Feb 1, 2010 at 1:46 PM >>> Subject: Re: OLIO-Java harness runs forever - round 2 >>> To: Akara Sucharitakul >> > >>> >>> >>> Thanks, >>> >>> Here's two PIDs I dumped, let me know if these aren't the ones you >>> want as there are several. >>> >>> Thanks, >>> Joshua >>> >>> >>> On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul >>> > >>> wrote: >>> >>> kill -QUIT >>> >>> or >>> >>> $JAVA_HOME/bin/jstack >>> >>> -Akara >>> >>> Joshua Schnee wrote: >>> >>> Can you tell me how to do this? I'm not sure how do this >>> when the harness doesn't hit an exception... >>> >>> Thanks, >>> Joshua >>> >>> On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul >>> >> >>> >> >> wrote: >>> >>> Can you please obtain me a stack dump of the Faban >>> harness? If it >>> hangs somewhere, we'll be able to diagnose better. Thanks. >>> >>> -Akara >>> >>> >>> Joshua Schnee wrote: >>> >>> OK, so I'm seeing the harness just run indefinitely >>> again, and >>> this time it isn't related to the maximum open files. >>> >>> Details and Background information: >>> Faban 1.0 build 111109 >>> Olio Java 0.2 >>> The workload is being driven from physical Windows >>> client >>> against 2 VMs on a system that is doing many other >>> tasks. >>> >>> Usage Info: >>> Physical Client usage : Avg ~ 22.93%, Max ~ 42.15% >>> Total System under test Utilization ~95% >>> Web avg util = 5.3%, Max =40.83% (during manual reload) >>> DB avg uili = 4.2%, Max = 55.6% (during auto reload) >>> >>> Granted, the system under test is near saturation, >>> the client >>> isn't. I'm not sure why the harness is never >>> exiting. Even if >>> the VMs or even the system under test gets so >>> saturated that >>> they can't respond to requests, shouldn't the test, >>> which is >>> running on the under utilized client finish regardless, >>> reporting whatever results it can? Shanti, you had >>> previously >>> asked me to file a JIRA, on this, of which I forgot. >>> I can do >>> so now, if you'd like. Finally, glassfish appears to >>> be stuck, >>> it's running, but not responding to requests, >>> probably due to >>> the SEVERE entry in the server.log file (see below). >>> >>> Faban\Logs: >>> agent.log = empty >>> cmdagent.log = No issues >>> faban.log.xml = No issues >>> >>> Master\Logs >>> catalina.out = No issues >>> localhost*log*txt = No issues >>> OlioDriver.3C\ >>> log.xml : Two types of issues, doTagSearch and the >>> catastrophic >>> "Forcefully terminating benchmark run" : attached >>> GlassFish logs >>> jvm.log : Numerous dependency_failed entries : >>> attached server.log : Serveral SEVERE entires : >>> Most notibly one where "a >>> signal was attempted before wait()" : Attached >>> Any help resolving this would me much appreciated, >>> -Josh >>> >>> >>> >>> >>> >>> -- -Josh >>> >>> >>> >>> >>> >>> -- -Josh >>> >>> >>> >>> >>> -- -Josh >>> >>> >>> >>> >>> -- >>> -Josh >>> >>> >> > > > -- > -Josh > > -- -Josh --001485f4238a933b91047f2ce951 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Akara,

We are still experiencing this issue, across seve= ral different physical clients (3 now). =A0I've asked one of our lead J= ava developers to review the stack traces and although he agrees with your= =A0assessment=A0of where the harness, master and driver are all hung waitin= g for the driver to terminate. =A0He said he was somewhat confused by the c= laim that Faban will time out an attempted read after 30 seconds. =A0He bel= ieves this to be a blocking read call, one that will wait forever for data = to be returned unless it is interrupted somehow.

He suggested putting in a 30-second read timeout on the= threads at the end of the rampdown to stop the harness from running foreve= r, but was also concerned about the run never reaching the end of the rampd= own period. =A0Does Faban still do the rampdown phase if a run is being abo= rted?

Also, could you possibly tell me what block of code in = Faban assures the timeout constraint? =A0He thought he might be able to wor= k backwards from there...

Thanks,
Joshua=

On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schn= ee <jpschnee@gma= il.com> wrote:
Akara,

We don't know of anything that is specific to= our configuration, but one thought is that it has something to do with our= running it under a Windows client. =A0We'll dig a bit deeper, but if y= ou have any ideas, they would also be=A0appreciated.

Thanks,
Joshua


On Tue, Feb 2, 2010 at 10:30 AM, Akar= a Sucharitakul <Akara.Sucharitakul@sun.com> wrote:<= br>
Josh,

I've checked the stacks and it turns out all driver threads are waiting= on socket read (reading from the server). Faban has a default read timeout= of 30 seconds and interrupts the thread and therefore the I/O at 2 minutes= after the rampdown has ended. So it should not run forever.

Is there anything in your configuration that causes the socket not to timeo= ut according to contract or be non-interruptible? Otherwise I couldn't = see how it could run forever.

Here are some details:

1. pid 608 seems to be an old stale instance of the harness that does not w= ant to terminate. Please kill it off.
2. pid 2504 is a driver agent. All threads are waiting on socket read.
3. pid 4508 is the master. It is waiting for the driver agent to finish up = the threads (in a join).
4. pid 4080 is the harness. It is waiting for the master to terminate.
=

Thanks,
-Akara

Joshua Schnee wrote:
Turns out I didn't get as much info as what is probably needed. =A0Here= 's an updated zip file with more dumps...

Thanks,


On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jpschnee@gmail.com <mailto:jpschnee@gmail.com>>= ; wrote:

=A0 =A0Just realized the list didn't get copied.


=A0 =A0---------- Forwarded message ----------
=A0 =A0From: *Joshua Schnee* <jpschnee@gmail.com <mailto:jpschnee@gmail.com>>
=A0 =A0Date: Mon, Feb 1, 2010 at 1:46 PM
=A0 =A0Subject: Re: OLIO-Java harness runs forever - round 2
=A0 =A0To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
=A0 =A0<mailto:Akara.Sucharitakul@sun.com>>


=A0 =A0Thanks,

=A0 =A0Here's two PIDs I dumped, let me know if these aren't the o= nes you
=A0 =A0want as there are several.

=A0 =A0Thanks,
=A0 =A0Joshua


=A0 =A0On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
=A0 =A0<Akara.Sucharitakul@sun.com <mailto:Akara.Sucharitakul@sun.com>> wrot= e:

=A0 =A0 =A0 =A0kill -QUIT <pid>

=A0 =A0 =A0 =A0or

=A0 =A0 =A0 =A0$JAVA_HOME/bin/jstack <pid>

=A0 =A0 =A0 =A0-Akara

=A0 =A0 =A0 =A0Joshua Schnee wrote:

=A0 =A0 =A0 =A0 =A0 =A0Can you tell me how to do this? =A0I'm not sure= how do this
=A0 =A0 =A0 =A0 =A0 =A0when the harness doesn't hit an exception...
=A0 =A0 =A0 =A0 =A0 =A0Thanks,
=A0 =A0 =A0 =A0 =A0 =A0Joshua

=A0 =A0 =A0 =A0 =A0 =A0On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul=
=A0 =A0 =A0 =A0 =A0 =A0<Akara.Sucharitakul@sun.com
=A0 =A0 =A0 =A0 =A0 =A0<mailto:Akara.Sucharitakul@sun.com>
=A0 =A0 =A0 =A0 =A0 =A0<mailto:Akara.Sucharitakul@sun.com
=A0 =A0 =A0 =A0 =A0 =A0<mailto:Akara.Sucharitakul@sun.com>>> wrote:
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Can you please obtain me a stack dump of the F= aban
=A0 =A0 =A0 =A0 =A0 =A0harness? If it
=A0 =A0 =A0 =A0 =A0 =A0 =A0 hangs somewhere, we'll be able to diagnose= better. Thanks.

=A0 =A0 =A0 =A0 =A0 =A0 =A0 -Akara


=A0 =A0 =A0 =A0 =A0 =A0 =A0 Joshua Schnee wrote:

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 OK, so I'm seeing the harness just= run indefinitely
=A0 =A0 =A0 =A0 =A0 =A0again, and
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 this time it isn't related to the = maximum open files.

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Details and Background information: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Faban 1.0 build 111109
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Olio Java 0.2
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 The workload is being driven from phys= ical Windows client
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 against 2 VMs on a system that is doin= g many other tasks.

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Usage Info:
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Physical Client usage : Avg ~ 22.93%, = Max ~ 42.15%
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Total System under test Utilization ~9= 5%
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Web avg util =3D 5.3%, Max =3D40.83% (= during manual reload)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 DB =A0 avg uili =3D 4.2%, Max =3D 55.6= % (during auto reload)

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Granted, the system under test is near= saturation,
=A0 =A0 =A0 =A0 =A0 =A0the client
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 isn't. =A0I'm not sure why the= harness is never
=A0 =A0 =A0 =A0 =A0 =A0exiting. =A0Even if
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 the VMs or even the system under test = gets so
=A0 =A0 =A0 =A0 =A0 =A0saturated that
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 they can't respond to requests, sh= ouldn't the test,
=A0 =A0 =A0 =A0 =A0 =A0which is
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 running on the under utilized client f= inish regardless,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reporting whatever results it can? =A0= Shanti, you had
=A0 =A0 =A0 =A0 =A0 =A0previously
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 asked me to file a JIRA, on this, of w= hich I forgot.
=A0 =A0 =A0 =A0 =A0 =A0 I can do
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 so now, if you'd like. =A0Finally,= glassfish appears to
=A0 =A0 =A0 =A0 =A0 =A0be stuck,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 it's running, but not responding t= o requests,
=A0 =A0 =A0 =A0 =A0 =A0probably due to
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 the SEVERE entry in the server.log fil= e (see below).

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Faban\Logs:
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0agent.log =3D empty
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cmdagent.log =3D No issues
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0faban.log.xml =3D No issues

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Master\Logs
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catalina.out =3D No issues
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0localhost*log*txt =3D No issues
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0OlioDriver.3C\
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0log.xml : Two types of issues, doTa= gSearch and the
=A0 =A0 =A0 =A0 =A0 =A0catastrophic
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "Forcefully terminating benchmark= run" : attached
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 GlassFish logs
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jvm.log : Numerous dependency_fail= ed entries :
=A0 =A0 =A0 =A0 =A0 =A0attached =A0 =A0 =A0 =A0 =A0server.log : Serveral S= EVERE entires :
=A0 =A0 =A0 =A0 =A0 =A0Most notibly one where "a
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 signal was attempted before wait()&quo= t; : Attached
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Any help resolving this would me much = appreciated,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -Josh





=A0 =A0 =A0 =A0 =A0 =A0-- =A0 =A0 =A0 =A0 =A0 =A0-Josh





=A0 =A0-- =A0 =A0-Josh




=A0 =A0-- =A0 =A0-Josh




--
-Josh





--
-Josh



--
-Josh

--001485f4238a933b91047f2ce951--