incubator-olio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Schnee <jpsch...@gmail.com>
Subject Re: OLIO-Java harness runs forever - round 2
Date Tue, 09 Feb 2010 15:52:40 GMT
Akara,

We are still experiencing this issue, across several different physical
clients (3 now).  I've asked one of our lead Java developers to review the
stack traces and although he agrees with your assessment of where the
harness, master and driver are all hung waiting for the driver to terminate.
 He said he was somewhat confused by the claim that Faban will time out an
attempted read after 30 seconds.  He believes this to be a blocking read
call, one that will wait forever for data to be returned unless it is
interrupted somehow.

He suggested putting in a 30-second read timeout on the threads at the end
of the rampdown to stop the harness from running forever, but was also
concerned about the run never reaching the end of the rampdown period.  Does
Faban still do the rampdown phase if a run is being aborted?

Also, could you possibly tell me what block of code in Faban assures the
timeout constraint?  He thought he might be able to work backwards from
there...

Thanks,
Joshua

On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee <jpschnee@gmail.com> wrote:

> Akara,
>
> We don't know of anything that is specific to our configuration, but one
> thought is that it has something to do with our running it under a Windows
> client.  We'll dig a bit deeper, but if you have any ideas, they would also
> be appreciated.
>
> Thanks,
> Joshua
>
>
> On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul <
> Akara.Sucharitakul@sun.com> wrote:
>
>> Josh,
>>
>> I've checked the stacks and it turns out all driver threads are waiting on
>> socket read (reading from the server). Faban has a default read timeout of
>> 30 seconds and interrupts the thread and therefore the I/O at 2 minutes
>> after the rampdown has ended. So it should not run forever.
>>
>> Is there anything in your configuration that causes the socket not to
>> timeout according to contract or be non-interruptible? Otherwise I couldn't
>> see how it could run forever.
>>
>> Here are some details:
>>
>> 1. pid 608 seems to be an old stale instance of the harness that does not
>> want to terminate. Please kill it off.
>> 2. pid 2504 is a driver agent. All threads are waiting on socket read.
>> 3. pid 4508 is the master. It is waiting for the driver agent to finish up
>> the threads (in a join).
>> 4. pid 4080 is the harness. It is waiting for the master to terminate.
>>
>>
>> Thanks,
>> -Akara
>>
>> Joshua Schnee wrote:
>>
>>> Turns out I didn't get as much info as what is probably needed.  Here's
>>> an updated zip file with more dumps...
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jpschnee@gmail.com<mailto:
>>> jpschnee@gmail.com>> wrote:
>>>
>>>    Just realized the list didn't get copied.
>>>
>>>
>>>    ---------- Forwarded message ----------
>>>    From: *Joshua Schnee* <jpschnee@gmail.com <mailto:jpschnee@gmail.com
>>> >>
>>>    Date: Mon, Feb 1, 2010 at 1:46 PM
>>>    Subject: Re: OLIO-Java harness runs forever - round 2
>>>    To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>>>    <mailto:Akara.Sucharitakul@sun.com>>
>>>
>>>
>>>    Thanks,
>>>
>>>    Here's two PIDs I dumped, let me know if these aren't the ones you
>>>    want as there are several.
>>>
>>>    Thanks,
>>>    Joshua
>>>
>>>
>>>    On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>>>    <Akara.Sucharitakul@sun.com <mailto:Akara.Sucharitakul@sun.com>>
>>> wrote:
>>>
>>>        kill -QUIT <pid>
>>>
>>>        or
>>>
>>>        $JAVA_HOME/bin/jstack <pid>
>>>
>>>        -Akara
>>>
>>>        Joshua Schnee wrote:
>>>
>>>            Can you tell me how to do this?  I'm not sure how do this
>>>            when the harness doesn't hit an exception...
>>>
>>>            Thanks,
>>>            Joshua
>>>
>>>            On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>>>            <Akara.Sucharitakul@sun.com
>>>            <mailto:Akara.Sucharitakul@sun.com>
>>>            <mailto:Akara.Sucharitakul@sun.com
>>>            <mailto:Akara.Sucharitakul@sun.com>>> wrote:
>>>
>>>               Can you please obtain me a stack dump of the Faban
>>>            harness? If it
>>>               hangs somewhere, we'll be able to diagnose better. Thanks.
>>>
>>>               -Akara
>>>
>>>
>>>               Joshua Schnee wrote:
>>>
>>>                   OK, so I'm seeing the harness just run indefinitely
>>>            again, and
>>>                   this time it isn't related to the maximum open files.
>>>
>>>                   Details and Background information:
>>>                   Faban 1.0 build 111109
>>>                   Olio Java 0.2
>>>                   The workload is being driven from physical Windows
>>> client
>>>                   against 2 VMs on a system that is doing many other
>>> tasks.
>>>
>>>                   Usage Info:
>>>                   Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>>>                   Total System under test Utilization ~95%
>>>                   Web avg util = 5.3%, Max =40.83% (during manual reload)
>>>                   DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>>
>>>                   Granted, the system under test is near saturation,
>>>            the client
>>>                   isn't.  I'm not sure why the harness is never
>>>            exiting.  Even if
>>>                   the VMs or even the system under test gets so
>>>            saturated that
>>>                   they can't respond to requests, shouldn't the test,
>>>            which is
>>>                   running on the under utilized client finish regardless,
>>>                   reporting whatever results it can?  Shanti, you had
>>>            previously
>>>                   asked me to file a JIRA, on this, of which I forgot.
>>>             I can do
>>>                   so now, if you'd like.  Finally, glassfish appears to
>>>            be stuck,
>>>                   it's running, but not responding to requests,
>>>            probably due to
>>>                   the SEVERE entry in the server.log file (see below).
>>>
>>>                   Faban\Logs:
>>>                    agent.log = empty
>>>                    cmdagent.log = No issues
>>>                    faban.log.xml = No issues
>>>
>>>                   Master\Logs
>>>                    catalina.out = No issues
>>>                    localhost*log*txt = No issues
>>>                    OlioDriver.3C\
>>>                    log.xml : Two types of issues, doTagSearch and the
>>>            catastrophic
>>>                   "Forcefully terminating benchmark run" : attached
>>>                     GlassFish logs
>>>                     jvm.log : Numerous dependency_failed entries :
>>>            attached          server.log : Serveral SEVERE entires :
>>>            Most notibly one where "a
>>>                   signal was attempted before wait()" : Attached
>>>                   Any help resolving this would me much appreciated,
>>>                   -Josh
>>>
>>>
>>>
>>>
>>>
>>>            --            -Josh
>>>
>>>
>>>
>>>
>>>
>>>    --    -Josh
>>>
>>>
>>>
>>>
>>>    --    -Josh
>>>
>>>
>>>
>>>
>>> --
>>> -Josh
>>>
>>>
>>
>
>
> --
> -Josh
>
>


-- 
-Josh

Mime
View raw message