jmeter-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirk Pepperdine <kirk.pepperd...@gmail.com>
Subject Re: Coordinated Omission (CO) - possible strategies
Date Sat, 19 Oct 2013 14:26:04 GMT

On 2013-10-19, at 9:56 AM, Gil Tene <gil@azulsystems.com> wrote:

> To focus on the "how to deal with Coordinated Omission" part:
> 
> There are two main ways to deal with CO in your actual executed behavior:
> 
> 1. Change the behavior to avoid CO to begin with. 
> 
> 2. Detect it and correct it.

I'll add detect and report. I believe there is value beyond you can't believe the data. It's
telling you that there is a condition that you need to eliminate from your test.
> 
> There is a "detect it and report it" one too, but I dot think it is of any real use,
as detection without correction will just tell you your data can't be believed at all, but
won't tell you anything about what can be. Since CO can move percentile magnitudes and position
by literal multiple orders if magnitude (I have multiple measured real world production behaviors
that show this) , "hoping it us not too bad" when you know it is there amounts to burying
your head in the sand.
> 
> So Kirk, is the random behavior you need one if random timing, or random operation sequencing
(or both)?

I need operations to occur at a random internal. That said, the interval is "random" to the
server and *not* to JMeter. JMeter can pre-calculate when certain events should occur and
then detect when it misses that target. The easiest way to do this is to build an event (sampler??)
queue that understands when things such as the next HTTP sampler should be fired.

Regards,
Kirk
 
> 
> Sent from my iPad
> 
> On Oct 18, 2013, at 10:48 PM, "Kirk Pepperdine" <kirk.pepperdine@gmail.com> wrote:
> 
>> 
>> On 2013-10-19, at 1:33 AM, Gil Tene <gil@azulsystems.com> wrote:
>> 
>>> I guess we look at human response back pressure in different ways. It's a question
of whether or not you consider the humans to be part of the system you are testing, and what
you think your stats are supposed to represent.
>> 
>> You've seen my presentations and so you know that I do believe that human and non-human
actors are definitively part of the system. They provide the dynamics for the system being
tested. A change in how that layer in my model works can and does makes a huge difference
in how the other layers work to support the overall system.
>>> 
>>> Some people will take the "forgiving" approach, which considers the client behavior
client as part of the overall system behavior. In such an approach, if a human responded to
slow behavior by not asking any more questions for a while, that's simply what the overall
system did, and the stats reported should reflect only the actual attempts that actual humans
would have, including their slowing down their requests in response to slow reaction times.

>> 
>> Sort of. I want to know that a user was inhibited from making forward progress because
the previous step in their workflow blew stated tolerances. In some cases I'd like to have
that user abandon. I'm not sure I'd call this forgiving though I am looking to see what the
overall system can do to answer the question; is it good enough and if not, why not.
>> 
>> I'm not going to suggest your view is incorrect. I think it's quite valid. I don't
believe the two views are orthogonal and that there are elements of both in each. The question
here on more practical terms is; what needs to be done to reduce the level of CO that currently
occurs in JMeter and how should we react to it. Throwing out entire datasets from runs seems
like an academic answer to a more practical question; will our application stand up when under
load. From my point of view, for JMeter to better answer that question. 
>> 
>>> 
>>> A web site being completely down for 5 minutes an hour would generate a lot of
human back pressure response. It may even slow down request rates so much during the outage
that 99%+ of the overall actual requests by end users during an hour that included such a
5 minute outage would still be very good. Reporting on those (actual requests by humans) would
be very different from reporting on what would have happened without human back pressure.
But it's easy to examine which of the two reporting methods would be accepted by a reader
of such reports.
>> 
>> But then that 5 minute outage is going to show up some where and if you bury it in
how you report.... that would seem to be a problem. This whole argument suggests that what
you want is a better regime for the treatment of the data. If that is what you're saying,
we're in complete agreement. The 5 minute pause should not be filtered out of the data!
>> 
>> IMHO, the first thing to do is eliminate or reduce the known sources of CO from JMeter.
I'm not sure that tackling the CTT is the beat way to go. In fact I'd prefer a combination
of approaches that includes things like how jHiccup works with a GC STW detector. As you've
mentioned before, even with a fix to the threading model in JMeter, CO will still occur.
>> 
>> Regards,
>> Kirk
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message