cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6638) SSTableScanner can Skip Rows with vnodes
Date Thu, 30 Jan 2014 11:58:09 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-6638:
----------------------------------------

    Attachment: 6638.txt

Attaching simple fix (the patch includes Tyler's unit test).

For the records, the original reported of the issue, Ignace Desimpel, provided  the following
analysis on the mailing list:
{quote}
To see what is wrong, think of having 3 ranges in the list, and both the first and second
range will not produce a valid currentKey. The first time in the loop we get the first range,
and then call seekToCurrentRangeStart(). That routine doesn’t do anything in that case,
so then the first key is read from the sstable. But this first key does not match the first
range, so we loop again. We get the second range and call seekToCurrentRangeStart() again.
Again this does not do anything, leaving all file pointers. So then a new currentKey is read
from the sstable BUT that should not be the case. We should, in that case, continue to test
with the ‘old’ currentKey.
{quote}
(which is a fair description of what triggers the problem) and proposed a fix. That fix was
modifying the logic of KeyScanningIterator.computeNext(), and while the fix itself is probably
fine, it complicate the logic a bit and I think it's simpler and cleaner to just fix seekToCurrentRangeStart
to really always seek to the first key greater than the current range start (that was clearly
the intent given the comment when indexPosition = -1 in that method, but the method wrongly
assumed we were always at the beginning of the sstable). So that's what the attached patch
does. I'll note that this does mean we might re-read the same key if the sstable have no keys
for at least 2 consecutive range, but this really doesn't matter in practice so we should
stick to cleaner code.

bq. Are there non-cleanup related consequences of this issue?

No. This can only ever happen if SSTableScanner is uses with at least 2 ranges and cleanup
is the only place we do that (I double-checked). I'll also note that this could affect non-vnode
cases though it's a lot less likely, at least provided you use a random partitionner, as you
need for a sstable to have no keys for a given local range, which is possible but much less
likely without vnodes. Lastly, this might skip at most 1 row per-local-range.

> SSTableScanner can Skip Rows with vnodes
> ----------------------------------------
>
>                 Key: CASSANDRA-6638
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6638
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Blocker
>             Fix For: 2.0.5
>
>         Attachments: 6638-repro-test.txt, 6638.txt
>
>
> CASSANDRA-2524 added multiple range support to SSTableScanner, but it looks like there
is at least one case where keys can be skipped.  This can result in cleanup removing legitimate
keys.
> See the attached patch that adds a unit test to reproduce the case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message