cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Boudreault (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10156) Creating Materialized views concurrently leads to missing data
Date Sat, 22 Aug 2015 14:49:45 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Boudreault updated CASSANDRA-10156:
----------------------------------------
    Description: 
[~nutbunnies] was writing dtests that create multiple tables concurrently. He also wrote a
test that creates multiple MV but has not been able to get it works properly. After some debugging
outside of dtest, it seems that there is an issue if we create more than 1 MV at the same
time. There is no errors in the log but the MV are never entirely populated and are missing
data.

I've attached 2 scripts:

[^mv_test_bad.sh]: is the one that reproduce the issue. It creates 4 MVs at the same time.
At the end, some data are missing in the MVs and there is nothing in system.hints or system.batchlog.

[^mv_test_good.sh]: is the same script but that waits 10 seconds between each MV creation,
which results in 4 MVs with all the data.

Some more notes from Andrew:
{code}
- lowering the number of rows inserted below ~1000 won't exhibit the inconsistent behavior
- adding more columns/MV make it worse -- more of the MVs counts are consistently wrong
- multiple runs will range in disagreement -- usually one of the MVs is correct though
- the describe cluster and system.mv* queries always "look" good
{/code}

Thanks Andrew for finding this bug! 

//cc [~carlyeks] [~tjake] [~enigmacurry]

  was:
[~nutbunnies] was writing dtests that create multiple tables concurrently. He also wrote a
test that creates multiple MV but has not been able to get it works properly. After some debugging
outside of dtest, it seems that there is an issue if we create more than 1 MV at the same
time. There is no errors in the log but the MV are never entirely populated and are missing
data.

I've attached 2 scripts:

[^mv_test_bad.sh]: is the one that reproduce the issue. It creates 4 MVs at the same time.
At the end, some data are missing in the MVs and there is nothing in system.hints or system.batchlog.

[^mv_test_good.sh]: is the same script but that waits 10 seconds between each MV creation,
which results in 4 MVs with all the data.

Some more notes from Andrew:
{code}
# - lowering the number of rows inserted below ~1000 won't exhibit the inconsistent behavior
# - adding more columns/MV make it worse -- more of the MVs counts are consistently wrong
# - multiple runs will range in disagreement -- usually one of the MVs is correct though
# - the describe cluster and system.mv* queries always "look" good
{/code}

Thanks Andrew for finding this bug! 

//cc [~carlyeks] [~tjake] [~enigmacurry]


> Creating Materialized views concurrently leads to missing data
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-10156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10156
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>             Fix For: 3.x
>
>         Attachments: mv_test_bad.sh, mv_test_good.sh
>
>
> [~nutbunnies] was writing dtests that create multiple tables concurrently. He also wrote
a test that creates multiple MV but has not been able to get it works properly. After some
debugging outside of dtest, it seems that there is an issue if we create more than 1 MV at
the same time. There is no errors in the log but the MV are never entirely populated and are
missing data.
> I've attached 2 scripts:
> [^mv_test_bad.sh]: is the one that reproduce the issue. It creates 4 MVs at the same
time. At the end, some data are missing in the MVs and there is nothing in system.hints or
system.batchlog.
> [^mv_test_good.sh]: is the same script but that waits 10 seconds between each MV creation,
which results in 4 MVs with all the data.
> Some more notes from Andrew:
> {code}
> - lowering the number of rows inserted below ~1000 won't exhibit the inconsistent behavior
> - adding more columns/MV make it worse -- more of the MVs counts are consistently wrong
> - multiple runs will range in disagreement -- usually one of the MVs is correct though
> - the describe cluster and system.mv* queries always "look" good
> {/code}
> Thanks Andrew for finding this bug! 
> //cc [~carlyeks] [~tjake] [~enigmacurry]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message