cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-13939) Mishandling of cells for removed/dropped columns when reading legacy files
Date Fri, 06 Oct 2017 12:35:00 GMT


Robert Stupp updated CASSANDRA-13939:
    Status: Ready to Commit  (was: Patch Available)

> Mishandling of cells for removed/dropped columns when reading legacy files
> --------------------------------------------------------------------------
>                 Key: CASSANDRA-13939
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.x, 3.11.x
> The tl;dr is that there is a bug in reading legacy files that can manifests itself with
a trace looking like this:
> {noformat}
> Exception (java.lang.IllegalStateException) encountered during startup: One row required,
2 found
> java.lang.IllegalStateException: One row required, 2 found
>     at org.apache.cassandra.cql3.UntypedResultSet$
>     at org.apache.cassandra.schema.LegacySchemaMigrator.readTableTimestamp(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.readTable(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readTables$7(
>     at org.apache.cassandra.schema.LegacySchemaMigrator$$Lambda$126/591203139.accept(Unknown
>     at java.util.ArrayList.forEach(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.readTables(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.readKeyspace(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readSchema$4(
>     at org.apache.cassandra.schema.LegacySchemaMigrator$$Lambda$123/1612073393.accept(Unknown
>     at java.util.ArrayList.forEach(
>     at org.apache.cassandra.schema.LegacySchemaMigrator.readSchema(
> {noformat}
> The reason this can happen has to do with the handling of legacy files. Legacy files
are cell based while the 3.0 storage engine is primarily row based, so we group those cells
into rows early in the deserialization process (in {{UnfilteredDeserializer.OldFormatDeserializer}}),
but in doing so, we can only consider a row finished when we've either reach the end of the
partition/file, or when we've read a cell that doesn't belong to that row.  That second case
means that when the deserializer returns a given row, the underlying file pointer may actually
not positioned at the end of that row, but rather it may be past the first cell of the next
row (which the deserializer remembers for future use). Long story short, when we try to detect
if we're logically past our current index block in {{AbstractIterator.IndexState#isPastCurrentBlock(}}),
we can't simply rely on the file pointer, which again may be a bit more advanced that we logically
are, and that's the reason for the "correction" in that method. That correction is really
just the amount of bytes remembered but not yet used in the deserializer.
> That "correction" is sometimes wrong however and that's due to the fact that in {{LegacyLayout#readLegacyAtom}},
if we get a cell for an dropped or removed cell, we ignore that cell (which, in itself, is
fine). Problem is that this skipping is done within the {{LegacyLayout#readLegacyAtom}} method
but without {{UnfilteredDeserializer.OldFormatDeserializer}} knowing about it. As such, the
size of the skipped cell ends up being accounted in the "correction" bytes for the next cell
we read. Lo and behold, if such cell for a removed/dropped column is both the last cell of
a CQL row and just before an index boundary (pretty unlikely in general btw, but definitively
possible), then the deserializer will count its size with the first cell of the next row,
which happens to also be the first cell of the next index block.  And when the code then tries
to figure out if it crossed an index boundary, it will over-correct. That is, the {{indexState.updateBlock()}}
call at the start of {{SSTableIterator.ForwardIndexedReader#computeNext}} will not work correctly.
 This can then make the code return a row that is after the requested slice end (and should
thus not be returned) because it doesn't compare that row to said requested end due to thinking
it's not on the last index block to read, even though it genuinely is.
> Anyway, the whole explanation is a tad complex, but the fix isn't: we need to move the
skipping of cells for removed/dropped column a level up so the deserializer knows about it
and don't silently count their size in the next atom size.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message