cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kjellman (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14054) testRegularColumnTimestampUpdates - org.apache.cassandra.cql3.ViewTest is flaky: expected <2> but got <1>
Date Thu, 07 Dec 2017 04:45:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281326#comment-16281326
] 

Michael Kjellman edited comment on CASSANDRA-14054 at 12/7/17 4:44 AM:
-----------------------------------------------------------------------

[~alourie] hey, so sorry for the delayed reply.. i've been up to my eyeballs in the dtest
pytest work along with all the other stuff and totally let this slip. I don't have a super
great answer for you yet because I'm in the process of getting that story together... but
maybe we can make this work :)

If you take a look at my C* fork, there is a CircleCI config:
https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml

Create a free CircleCI account (if you don't have one yet) and register your C* fork on GitHub
with CircleCI. Then, grab the above config and put it in a branch of trunk in your personal
fork (you'll need to create a .circleci folder and put it in there. 

Starting at L47 of the config you'll need to switch things to use the free user config (i'm
running under the assumption you don't have a paid CircleCI account here).

{code}
# Set env_settings, env_vars, and workflows/build_and_run_tests based on environment
env_settings: &env_settings
    # <<: *default_env_settings
    <<: *high_capacity_env_settings
env_vars: &env_vars
    # <<: *default_env_vars
    <<: *high_capacity_env_vars
workflows:
    version: 2
    # build_and_run_tests: *default_jobs
    build_and_run_tests: *with_dtest_jobs
{code}

comment out the instances of high_capacity_* and comment back in the default_* ones... and
you might want to switch the workflows to only run the "default_jobs" which for right now
will just build C* and run the unit tests.

This test fails about 50% of the time on CircleCI. Potentially it's exacerbated by running
on Ubuntu? Another thing maybe worth trying is running the test via ant on ubuntu... The docker
image I put together for CircleCI is available on DockerHub (config checked in to https://github.com/mkjellman/cassandra-test-docker)
or you can grab it as kjellman/cassandra-test:0.1.3.

Another thing that we do is split up the unit tests across the total number of Circle containers
available... based on historical runs it actually will try to distribute the tests that run
in each container by time so you don't have a few containers with all the slow tests dragging
the entire thing down. This means we use invoke the tests in each container via "ant testclasslist
-Dtest.classlistfile=/path/to/unit/tests/to/run"... potentially maybe another test somewhere
else doesn't clean up after itself and that causes testRegularColumnTimestampUpdates to fail?
To be clear -- the splits across containers are on a per test method level -- not test class
-- so you might have various methods of ViewTest run across different containers at the same
time -- the results are all merged together by circle at the end to give one consolidated
report for all the unit tests. none of the other unit tests on trunk have been flaky or failing
when run via circle other than this test so I'm not sure I totally believe it's related to
order it's run in or another test not cleaning up after itself -- also there are a lot of
other asserts that are passing before the 2nd to last assert is hit (which is the one that's
always failing -- and always failing with the same value of 1 instead of 2)...

hope all this helps get the ball rolling again... any hunches by just looking at the code?
i don't really know the MV code very well... any chance there is a race between when the mv
is completed building and available and when the assert is hit? maybe we need some kind of
force blocking flush before we assert on those conditions? that's how we handle this in a
lot of the other compaction related tests that check sstables on disk and row count...


was (Author: mkjellman):
[~alourie] hey, so sorry for the delayed reply.. i've been up to my eyeballs in the dtest
pytest work along with all the other stuff and totally let this slip. I don't have a super
great answer for you yet because I'm in the process of getting that story together... but
maybe we can make this work :)

If you take a look at my C* fork, there is a CircleCI config:
https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml

Create a free CircleCI account (if you don't have one yet) and register your C* fork on GitHub
with CircleCI. Then, grab the above config and put it in a branch of trunk in your personal
fork (you'll need to create a .circleci folder and put it in there. 

Starting at L47 of the config you'll need to switch things to use the free user config (i'm
running under the assumption you don't have a paid CircleCI account here).

{code}
# Set env_settings, env_vars, and workflows/build_and_run_tests based on environment
env_settings: &env_settings
    # <<: *default_env_settings
    <<: *high_capacity_env_settings
env_vars: &env_vars
    # <<: *default_env_vars
    <<: *high_capacity_env_vars
workflows:
    version: 2
    # build_and_run_tests: *default_jobs
    build_and_run_tests: *with_dtest_jobs

comment out the instances of high_capacity_* and comment back in the default_* ones... and
you might want to switch the workflows to only run the "default_jobs" which for right now
will just build C* and run the unit tests.

This test fails about 50% of the time on CircleCI. Potentially it's exacerbated by running
on Ubuntu? Another thing maybe worth trying is running the test via ant on ubuntu... The docker
image I put together for CircleCI is available on DockerHub (config checked in to https://github.com/mkjellman/cassandra-test-docker)
or you can grab it as kjellman/cassandra-test:0.1.3.

Another thing that we do is split up the unit tests across the total number of Circle containers
available... based on historical runs it actually will try to distribute the tests that run
in each container by time so you don't have a few containers with all the slow tests dragging
the entire thing down. This means we use invoke the tests in each container via "ant testclasslist
-Dtest.classlistfile=/path/to/unit/tests/to/run"... potentially maybe another test somewhere
else doesn't clean up after itself and that causes testRegularColumnTimestampUpdates to fail?
To be clear -- the splits across containers are on a per test method level -- not test class
-- so you might have various methods of ViewTest run across different containers at the same
time -- the results are all merged together by circle at the end to give one consolidated
report for all the unit tests. none of the other unit tests on trunk have been flaky or failing
when run via circle other than this test so I'm not sure I totally believe it's related to
order it's run in or another test not cleaning up after itself -- also there are a lot of
other asserts that are passing before the 2nd to last assert is hit (which is the one that's
always failing -- and always failing with the same value of 1 instead of 2)...

hope all this helps get the ball rolling again... any hunches by just looking at the code?
i don't really know the MV code very well... any chance there is a race between when the mv
is completed building and available and when the assert is hit? maybe we need some kind of
force blocking flush before we assert on those conditions? that's how we handle this in a
lot of the other compaction related tests that check sstables on disk and row count...

> testRegularColumnTimestampUpdates - org.apache.cassandra.cql3.ViewTest is flaky: expected
<2> but got <1>
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14054
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Testing
>            Reporter: Michael Kjellman
>            Assignee: Alex Lourie
>
> testRegularColumnTimestampUpdates - org.apache.cassandra.cql3.ViewTest is flaky: expected
<2> but got <1>
> Fails about 25% of the time. It is currently our only flaky unit test on trunk so it
would be great to get this one fixed up so we can be confident in unit test failures going
forward.
> junit.framework.AssertionFailedError: Invalid value for row 0 column 0 (c of type int),
expected <2> but got <1>
> 	at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:973)
> 	at org.apache.cassandra.cql3.ViewTest.testRegularColumnTimestampUpdates(ViewTest.java:380)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message