cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9161) Add random interleaving for flush/compaction when running CQL unit tests
Date Wed, 02 Mar 2016 15:52:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175823#comment-15175823
] 

Benjamin Lerer edited comment on CASSANDRA-9161 at 3/2/16 3:51 PM:
-------------------------------------------------------------------

I am not really in favor of random operations. DTests are more random in nature and finding
the problem behind a flapping test requires a lot more time that for a normal test failure.
I honestly prefer writing more tests but having them deterministic.

My experience with CQL is that we usually forgot to test a certain amount of use cases. 

The most commons type of error in writting tests are:
# forgetting to test the different types of tables:
#* CQL table without clustering column
#* CQL table with clustering columns
#* Compact table without clustering columns 
#* Compact table with clustering columns
# forgetting to test with static columns. A use case which is often forgotten is the case
of partitions containing static data but no rows. For such a case it is important to make
sure that the partition with only static data is in the set being queried.
# if collection are involved, forgetting to test all the possible types (List, Set, Maps,
Tuples and UDFs) when they are frozen and non-frozen
# if the processing of data can be different when the data is read from Memtables or SSTables
forgetting to test with and without flush
# for paging not testing all the possible use cases:
#* Range queries (e.g. {{SELECT * FROM myTable}})
#* Range query with LIMIT  (e.g. {{SELECT * FROM myTable LIMIT 3}})
#* Range query with DISTINCT (e.g. {{SELECT DISTINCT pk, s  FROM myTable}})
#* Range query with DISTINCT and LIMIT  (e.g. {{SELECT DISTINCT pk, s  FROM myTable LIMIT
3}})
#* Range query with ORDER BY (should always be invalid)
#* Single partition queries (e.g. {{SELECT * FROM myTable WHERE pk = 1}})
#* Single partition queries with LIMIT (e.g. {{SELECT * FROM myTable WHERE pk = 1 LIMIT 3}})
#* Single partition queries  with DISTINCT (e.g. {{SELECT DISTINCT pk, s FROM myTable WHERE
pk = 1}})
#* Single partition queries  with DISTINCT and LIMIT (e.g. {{SELECT DISTINCT pk, s FROM myTable
WHERE pk = 1 LIMIT 3}})
#* Single partition queries with ORDER BY (e.g. {{SELECT * FROM myTable WHERE pk = 1 ORDER
BY clustering1 DESC}})
#* Single partition queries with ORDER BY and LIMIT (e.g. {{SELECT * FROM myTable WHERE pk
= 1 ORDER BY clustering1 DESC LIMIT 3}})
#* Multi-partitions queries (e.g. {{SELECT * FROM myTable WHERE pk IN (1, 2)}})
#* Multi-partitions queries with LIMIT (e.g. {{SELECT * FROM myTable WHERE pk IN (1, 2) LIMIT
3}})
#* Multi-partitions queries  with DISTINCT (e.g. {{SELECT DISTINCT pk, s FROM myTable WHERE
 pk IN (1, 2)}})
#* Multi-partitions queries  with DISTINCT and LIMIT (e.g. {{SELECT DISTINCT pk, s FROM myTable
WHERE pk IN (1, 2) LIMIT 3}})
For paging the tests should be written as DTests with multiple nodes in order to also check
the serialization protocol used between the nodes and the non local path. The tests should
use page sizes smaller and greater than the dataset. Testing with a paging of 1 is highly
recommanded as it makes sure that the last page return a full page.
# forgetting to test some invalid conditions. I found that this one is the harder to test
properly. There are so many wrong type of inputs that it is difficult to not miss some.





was (Author: blerer):
I am not really in favor of random operations. DTests are more random in nature and finding
the problem behind a flapping test require a lot more time that for a normal test failure.
I honnestly prefer writing more tests but having them deterministic.

My experience with CQL is that we usually forgot to test a certain amount of use cases. 

The most commons type of error in writting tests are:
# forgetting to test the different types of tables:
#* CQL table without clustering column
#* CQL table with clustering columns
#* Compact table without clustering columns 
#* Compact table with clustering columns
# forgetting to test with static columns. A use case which is often forgotten is the case
of partitions containing static data but no rows. For such a case it is important to make
sure that the partition with only static data is in the set being queried.
# if collection are involved, forgetting to test all the possible types (List, Set, Maps,
Tuples and UDFs) when they are frozen and non-frozen
# if the processing of data can be different when the data is read from Memtables or SSTables
forgetting to test with and without flush
# for paging not testing all the possible use cases:
#* Range queries (e.g. {{SELECT * FROM myTable}})
#* Range query with LIMIT  (e.g. {{SELECT * FROM myTable LIMIT 3}})
#* Range query with DISTINCT (e.g. {{SELECT DISTINCT pk, s  FROM myTable}})
#* Range query with DISTINCT and LIMIT  (e.g. {{SELECT DISTINCT pk, s  FROM myTable LIMIT
3}})
#* Range query with ORDER BY (should always be invalid)
#* Single partition queries (e.g. {{SELECT * FROM myTable WHERE pk = 1}})
#* Single partition queries with LIMIT (e.g. {{SELECT * FROM myTable WHERE pk = 1 LIMIT 3}})
#* Single partition queries  with DISTINCT (e.g. {{SELECT DISTINCT pk, s FROM myTable WHERE
pk = 1}})
#* Single partition queries  with DISTINCT and LIMIT (e.g. {{SELECT DISTINCT pk, s FROM myTable
WHERE pk = 1 LIMIT 3}})
#* Single partition queries with ORDER BY (e.g. {{SELECT * FROM myTable WHERE pk = 1 ORDER
BY clustering1 DESC}})
#* Single partition queries with ORDER BY and LIMIT (e.g. {{SELECT * FROM myTable WHERE pk
= 1 ORDER BY clustering1 DESC LIMIT 3}})
#* Multi-partitions queries (e.g. {{SELECT * FROM myTable WHERE pk IN (1, 2)}})
#* Multi-partitions queries with LIMIT (e.g. {{SELECT * FROM myTable WHERE pk IN (1, 2) LIMIT
3}})
#* Multi-partitions queries  with DISTINCT (e.g. {{SELECT DISTINCT pk, s FROM myTable WHERE
 pk IN (1, 2)}})
#* Multi-partitions queries  with DISTINCT and LIMIT (e.g. {{SELECT DISTINCT pk, s FROM myTable
WHERE pk IN (1, 2) LIMIT 3}})
For paging the tests should be written as DTests with multiple nodes in order to also check
the serialization protocol used between the nodes and the non local path. The tests should
use page sizes smaller and greater than the dataset. Testing with a paging of 1 is highly
recommanded as it makes sure that the last page return a full page.
# forgetting to test some invalid conditions. I found that this one is the harder to test
properly. There are so many wrong type of inputs that it is difficult to not miss some.




> Add random interleaving for flush/compaction when running CQL unit tests
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9161
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9161
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: Sylvain Lebresne
>              Labels: retrospective_generated
>
> Most CQL tests don't bother flushing, which means that they overwhelmingly test the memtable
path and not the sstables one. A simple way to improve on that would be to make {{CQLTester}}
issue flushes and compactions randomly between statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message