cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12997) dtest failure in org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName
Date Tue, 06 Dec 2016 15:28:58 GMT


Sylvain Lebresne commented on CASSANDRA-12997:

That test seem to have failed only once from CI history and I haven't been able to reproduce
in 100+ local runs, so I can theorize about what happened, but it's just that, theory.

The test insert a value for a column, drop the column, add it back (same type), and query
it. It expects the value to be {{null}} since it has remove through the drop, and that's what
it gets, except for that one run when it got the initial value back on the read.

My best theory is that it's a timestamp issue. Somehow, the initial insertion ended up with
a higher timestamp than the timestamp used for the {{DROP}}, so that on a read the old value
is considered post-{{DROP}}. Which, and I'm still in the realm of theories, could happen if
on CI nptd kicked in just at the wrong moment. It's a stretch, but as that has only even happened
once that I can tell, this could be a case of bad luck.

There is another reason why this could happen: inserts use {{ClientState.getTimestamp()}}
to generate their timestamp, which ensure that inserts coordinated by the same node always
get strictly increasing timestamps, and do so by adding "fake" microseconds when it has to
generate multiple timestamps in the same milliseconds. {{ALTER TABLE DROP}} simply relies
on {{System.currentTimeMillis()}} however, so if multiple inserts, followed by a {{DROP}}
happen in the same milliseconds, all but the first insert will actually have a higher timestamp
than the drop one. This is also a bit of a stretch because the test does only a single insert,
so while a prior test insert could interfer, that would be pretty timing. But it seems to
be what this is ultimately so ....

Anyway, the previous point is the crux of the patch I'm attaching below: it uses the {{ClientState.getTimestamp()}}
mechanism for {{ALTER TABLE DROP}} too. This ensures that the drop will always get a higher
timestamp than any previous insert (coordinated by the same node), thus avoiding both theoretical
problem above. It feels kind of the right thing to do anyway. The one annoyance is that we
have to pass the {{QueryState}} to {{announceMigration()}} which is overriden by all schema
altering statement, which is why the patch is more than 2 lines (but it's 2 lines of non automatic

| [12997|] | [utests|]
| [dtests|] |

I'll note lastly that the patch is against trunk, as was the test failure, but assuming my
theories above are indeed the reason for that one failure, this could happen on pretty much
any version. That said, it's really just to make the test happy as no-one in real life will
even run into this, so I don't see the point of taking any kind of risk.

> dtest failure in org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName
> -------------------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-12997
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sean McCarthy
>            Assignee: Sylvain Lebresne
>              Labels: test-failure, testall
> example failure:
> {code}
> Error Message
> Invalid value for row 0 column 2 (mycollection of type list<text>), expected <null>
but got <[first element]>
> {code}{code}Stacktrace
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 (mycollection
of type list<text>), expected <null> but got <[first element]>
> 	at org.apache.cassandra.cql3.CQLTester.assertRows(
> 	at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(
> {code}

This message was sent by Atlassian JIRA

View raw message