cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14616) cassandra-stress write hangs with default options
Date Mon, 13 Aug 2018 10:08:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578028#comment-16578028
] 

Jeremy edited comment on CASSANDRA-14616 at 8/13/18 10:07 AM:
--------------------------------------------------------------

Hello, I would like to try solving this issue.

I have done some preliminary testing and it appears that it is caused by cassandra-stress
waiting for uncertainty to stabilize, the trace from jstack is included below.

{quote}
java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000076d873e20> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at org.apache.cassandra.stress.util.Uncertainty$WaitForTargetUncertainty.await(Uncertainty.java:56)
        at org.apache.cassandra.stress.util.Uncertainty.await(Uncertainty.java:85)
        at org.apache.cassandra.stress.report.StressMetrics.waitUntilConverges(StressMetrics.java:135)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:269)
        at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:121)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:70)
        at org.apache.cassandra.stress.Stress.run(Stress.java:143)
        at org.apache.cassandra.stress.Stress.main(Stress.java:62)
{quote}

I also did some printlns for debugging in 3.11.
{quote}
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 1 minMeasurements: 30 
measurements: 1 maxMeasurements: 200 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 2 minMeasurements: 30 
measurements: 2 maxMeasurements: 200 
...
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 200 minMeasurements: 30 
measurements: 200 maxMeasurements: 200 
{quote}

In the warmup phase, the program aims for either uncertainty to fall below 0.02 with at least
30 measurements or to hit 200 measurements. It ends up waiting for 200 measurements since
the uncertainty is always NaN. The same problem doesn't occur in 3.0 because the Runnable
(https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java#L86)
calls wakeAll after 2 iterations. However, uncertainty is still always NaN in 3.0.

 The problem arises in 3.11 and trunk as that runnable loop was refactored  into reportingLoop
which waited for all 200 tries first. https://github.com/apache/cassandra/blob/cassandra-3.11/tools/stress/src/org/apache/cassandra/stress/report/StressMetrics.java#L154

Here's what it looks like for 3.0.
{quote}
Warming up WRITE with 0 iterations...
_______________________
Updated value: NaN 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 1 minMeasurements: 30 
measurements: 1 maxMeasurements: 200 
_______________________
Updated value: NaN 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 2 minMeasurements: 30 
measurements: 2 maxMeasurements: 200 
latch counted down via wakeall
wakeAll via line 123 in stressmetrics
WARNING: uncertainty mode (err<) results in uneven workload between thread runs, so should
be used for high level analysis only
Running with 4 threadCount
{quote}

I think this is being caused by having 0 iterations for warmup. The number of iterations is
decided at the start by 
 {{Math.min(50000, (int) (settings.command.count * 0.25)) * settings.node.nodes.size();}}.

https://github.com/apache/cassandra/blob/cassandra-3.11/tools/stress/src/org/apache/cassandra/stress/StressAction.java#L108
. 

When {{./tools/bin/cassandra-stress write}} is called without any arguments, settings.command.count
evaluates to -1 and  {{Math.min(50000, (int) (settings.command.count * 0.25)) * settings.node.nodes.size();}}
evaluates to 0 so we always end up with 0 iterations. 

One proposed fix is to choose a minimum nonzerovalue for iterations in the warmup phase. Something
like https://github.com/yarnspinnered/cassandra/commit/33cf059f63b56ac17a3f66869615d3d7cc52f8a9
. I tried this and it no longer hangs but I'm not sure on the exact value or if there is a
better way to fix this.


was (Author: yarnspinner):
Hello, I would like to try solving this issue.

I have done some preliminary testing and it appears that it is caused by cassandra-stress
waiting for uncertainty to stabilize, the trace from jstack is included below.

{quote}
java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000076d873e20> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at org.apache.cassandra.stress.util.Uncertainty$WaitForTargetUncertainty.await(Uncertainty.java:56)
        at org.apache.cassandra.stress.util.Uncertainty.await(Uncertainty.java:85)
        at org.apache.cassandra.stress.report.StressMetrics.waitUntilConverges(StressMetrics.java:135)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:269)
        at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:121)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:70)
        at org.apache.cassandra.stress.Stress.run(Stress.java:143)
        at org.apache.cassandra.stress.Stress.main(Stress.java:62)
{quote}

I also did some printlns for debugging in 3.11.
{quote}
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 1 minMeasurements: 30 
measurements: 1 maxMeasurements: 200 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 2 minMeasurements: 30 
measurements: 2 maxMeasurements: 200 
...
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 200 minMeasurements: 30 
measurements: 200 maxMeasurements: 200 
{quote}

In the warmup phase, the program aims for either uncertainty to fall below 0.02 or to hit
200 measurements. It ends up waiting for 200 measurements since the uncertainty is always
NaN. The same problem doesn't occur in 3.0 because the Runnable (https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java#L86)
calls wakeAll after 2 iterations. However, uncertainty is still always NaN in 3.0.

 The problem arises in 3.11 and trunk as that runnable loop was refactored  into reportingLoop
which waited for all 200 tries first. https://github.com/apache/cassandra/blob/cassandra-3.11/tools/stress/src/org/apache/cassandra/stress/report/StressMetrics.java#L154

Here's what it looks like for 3.0.
{quote}
Warming up WRITE with 0 iterations...
_______________________
Updated value: NaN 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 1 minMeasurements: 30 
measurements: 1 maxMeasurements: 200 
_______________________
Updated value: NaN 
uncertainty: NaN targetUncertainty: 0.020000 
measurements: 2 minMeasurements: 30 
measurements: 2 maxMeasurements: 200 
latch counted down via wakeall
wakeAll via line 123 in stressmetrics
WARNING: uncertainty mode (err<) results in uneven workload between thread runs, so should
be used for high level analysis only
Running with 4 threadCount
{quote}

I think this is being caused by having 0 iterations for warmup. The number of iterations is
decided at the start by 
 {{Math.min(50000, (int) (settings.command.count * 0.25)) * settings.node.nodes.size();}}.

https://github.com/apache/cassandra/blob/cassandra-3.11/tools/stress/src/org/apache/cassandra/stress/StressAction.java#L108
. 

When {{./tools/bin/cassandra-stress write}} is called without any arguments, settings.command.count
evaluates to -1 and  {{Math.min(50000, (int) (settings.command.count * 0.25)) * settings.node.nodes.size();}}
evaluates to 0 so we always end up with 0 iterations. 

One proposed fix is to choose a minimum nonzerovalue for iterations in the warmup phase. Something
like https://github.com/yarnspinnered/cassandra/commit/33cf059f63b56ac17a3f66869615d3d7cc52f8a9
. I tried this and it no longer hangs but I'm not sure on the exact value or if there is a
better way to fix this.

> cassandra-stress write hangs with default options
> -------------------------------------------------
>
>                 Key: CASSANDRA-14616
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Chris Lohfink
>            Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. To reproduce
{code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be impacted.
Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message