jmeter-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shmuel Krakower <shmul...@gmail.com>
Subject Re: Coordinated Omission
Date Thu, 08 Aug 2013 06:20:16 GMT
Hi,
I am aware of this problem, this is something the user should take care for
IMO.

As a user, I am doing two things to resolve this:
1. In general, I am creating more threads than needed and using
the Constant Throughput Timer to control the threads to actually generate
the required throughput.
2. I am looking at load test results and making sure that:
2.a. The total requests sent during the test are meeting the expected
numbers.
2.b. Looking at requests per second graph (from the JMeter plugins) over
the tests to see a very steady line. If the line is not steady, it means we
have some contention...


I think JMeter should not take care for such cases by default or at all.
The idea of Sebb to maybe put some warnings for the Constant Throughput
Timer if it cannot keep up with the phase is a good idea.

Shmuel Krakower.
www.Beatsoo.org - re-use your jmeter scripts for application performance
monitoring from worldwide locations for free.


On Tue, Aug 6, 2013 at 3:12 PM, sebb <sebbaz@gmail.com> wrote:

> On 6 August 2013 06:42, Kirk Pepperdine <kirk.pepperdine@gmail.com> wrote:
> > Hi Sebb,
> >
> > On 2013-08-06, at 2:54 AM, sebb <sebbaz@gmail.com> wrote:
> >
> >> On 3 August 2013 21:45, Kirk Pepperdine <kirk.pepperdine@gmail.com>
> wrote:
> >>> Hi all,
> >>>
> >>> To all that maybe interested. Gil Tene (from Azul Systems) has started
> a thread on Coordinated Omissions. it's his name for a problem that I noted
> in JMeter a while back and for which the current fix is the start threads
> on demand check box. The problems Gil describes go much deeper than the
> problems that are due to JMeters current threading model and lack of event
> based schedulers (which would provide at least a 10x boost in scalability
> and make it less likely that JMeter will work to hide bottlenecks in your
> application). The effect of CO is that the test harness becomes like a ref
> making a bad call at a critical point in the game.
> >>
> >> This is dependent on the test load; not all tests are affected by the
> problem.
> >
> > I would agree that not all load tests are affected by the problem though
> IME, most are.
> >>
> >>> Anyways, I shan't repeat the contents of the thread here. You can find
> it at mechanical-sympathy@googlegroups.com, a mailing list started by
> Martin Thompson, well known for his work on the Disruptor framework.
> >>
> >> Interesting thread.
> >> As it points out this affects many testing systems; JMeter is not
> >> alone in being affected.
> >
> > Absolutely true, never intended to suggest that it was. Of the load
> testing tools out there, JMeter continues to rank very highly because of
> it's reachability. I can teach a group of people how to  execute rich load
> tests using JMeter in about an hour. This is not true of any of the other
> tooling that is out there.
> >>
> >> As I understand it, the problem occurs when one or more samples are
> >> delayed because the previous sample did not complete in time.
> >> Assuming that the slowdown is caused by the system under test (SUT),
> >> the samples that cannot be sent are also likely to have taken longer
> >> than usual.
> >> So one overlong sample response can hide several others that would
> >> have occurred, thus affecting the statistics, particularly the high
> >> percentile stats.
> >
> > I think the premise of the discussion is that something happens in the
> harness that prevents it from firing when it should.
>
> Yes.
>
> > WIth JMeter having a long response time from the app would also
> contribute to this problem. In JMeter, a change in the threading model
> would help prevent this problem and would make JMeter far more scalable.
> Currently what happens is a thread picks up a script (ThreadGroup) and
> executes it. The problem is that the script is a far too granular unit of
> work to give to a thread. That problem becomes magnified with you loop over
> the script.
>
> Only if the thread is held up by a long response.
>
> > This created the 1 thread 1 user dependency that I've mentioned in the
> past. It is this dependency that IMHO, limits the scalability of JMeter. To
> break this dependency one would need to offer the thread a much less
> granular task than an entire script. So, instead of getting a script what
> is a thread only got a sampler. What if the samplers were in an event heap
> that was sorted by the time at which that sampler was to be triggered.
> Threads would then pick up the event, fire it and then calculate and inject
> the heap with the next event as directed by the script. This way instead of
> having a thread tied down it could be made available to trigger the next
> sample. This would allow the test to maintain a much more consistent load
> on the server by making it less likely that  the test is throttled by side
> effects in the load injector.
>
> Perhaps, but as already discussed previously, this would mean
> rewriting large parts of JMeter.
> Third party plugins would probably also be affected.
>
> >>
> >> To avoid this, we need to ensure that long sample responses do not
> >> prevent the generation of the next sample at the correct time.
> >>
> >> There are a couple of ways to mitigate this with the current JMeter
> >> design (which waits for a sample to complete before continuing with
> >> the next).
> >>
> >> 1) Ensure that each thread only needs to send requests at a relatively
> >> low rate, so that slow responses do not use up all the wait time.
> >> This may not be possible with a single JMeter instance, in which case
> >> use one instance to create the bulk of the load (ignoring the issue of
> >> delayed requests), and a second to measure the sample times. This
> >> second instance should have a low transaction rate per thread so slow
> >> responses don't affect the generated load, and should be used to
> >> derive the statistics.
> >
> > CO can occur under low loads as well as high ones.
>
> AIUI, it can only occur if the reponse time is sufficiently long to
> delay the next sample.
> The second thread would have to have plenty of slack wait-time to
> allow for this.
> This implies a low transaction rate per thread.
>
> E.g. if the maximum response time is 60 seconds, each thread would
> have to issue samples at less than 1 per minute.
> However there can be multiple such threads to increase the measurement
> rate in the second JMeter instance.
>
> >>
> >> 2) Use timeouts to abandon slow responses so that they cannot cause a
> >> slow down (this might not always be suitable).
> >
> > CO is a reporting error. By omitting the data point you will have only
> further contributed to CO.
>
> Yes.
>
> But choosing a timeout that corresponds to the maximum allowed
> response time, this might not matter.
> For HTML tests, it's quite likely that users will abandon the request
> if it exceeds a certain threshold.
>
> >>
> >> It should be fairly obvious from the test timings (particularly the
> >> transaction rate) whether this has happened.
> >
> > IME it's not been obvious that this has been happening unless you are
> inclined to dig deeper into what is happening with your load test. Many of
> the effects are subtile and often can only been when you use a histogram or
> some other visualization. Using an average will only bury this effect.
>
> If the required transaction rate is not generated, then obviously
> something has gone wrong which requires further investigation.
>
> >>
> >> It should still be possible to draw some useful conclusions about the
> >> behaviour of the SUT, e.g. what load starts to cause response
> >> slowdown; does the SUT start misbehaving in other ways (reporting
> >> errors etc).
> >
> > I'd agree that you can still draw useful conclusions from some very
> flawed benchmarks but the real problem is, how do you know when you can and
> when you shouldn't use the results from a bench?
>
> Maybe there are some simple measures that JMeter could implement that
> would help show when the problem has occurred.
> For example, the Constant Throughput Timer could perhaps report if it
> was unable to maintain the required rate?
>
> > The question has many subtile implications and I've seen teams get it
> wrong in a few cases it almost resulted in the project being terminated.
> >
> > Anyways, ever since I've changed all of my testing to never loop I've
> been finding it much easier to reliably load an application using JMeter.
> >
> > Regards,
> > Kirk
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@jmeter.apache.org
> > For additional commands, e-mail: user-help@jmeter.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@jmeter.apache.org
> For additional commands, e-mail: user-help@jmeter.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message