cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ross M (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (CASSANDRA-836) CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't change.
Date Sun, 28 Feb 2010 16:09:06 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ross M reopened CASSANDRA-836:
------------------------------


it's already a bug even if i "don't do that."

java's serialization of the BitSet makes no promise about the size of the serialization (and
it can't since it's a variable-size object.) the code is relying on behavior that isn't promised,
and therefore can break with updates to java, switches to other runtime, or even adding more
flags to the set. the code is currently lukcy, probably because the bitset only has 5 values,
if it was 9 the relied upon behavior may not be the case.

it also prevents you from being able to improve the serialization of BitSet. it's used in
a lot of places in the logs and data files (which is why i was looking at it in the first
place.)

> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't change.
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-836
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-836
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: n/a - all
>            Reporter: Ross M
>            Priority: Minor
>         Attachments: BitSetSerializer.java
>
>
> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't grow. there
are pieces of the header (BitSet) that are serialized with java serialization which makes
no such promises. 
> the following code:
>     /** writes header at the beginning of the file, then seeks back to current position
*/
>     void seekAndWriteCommitLogHeader(byte[] bytes) throws IOException
>     {
>         long currentPos = logWriter.getFilePointer();
>         logWriter.seek(0);
>         writeCommitLogHeader(bytes);
>         logWriter.seek(currentPos);
>     }
> works fine as long as the header size doesn't change, but if it grows the new header
will over write the beginning of the data segment. the bit-set being written in the header
happens to serialize to the same size, but there is no guarantee of this.
> i found this when looking at optimizing the serialization of data to disk (thus improving
write throughput/performance.) i removed the ObjectOutputStream serialization in BitSetSerializer
and replaced it with a custom serialization that omits the generic java serialization/ObjectOutputStream
stuff and just writes on the "true" bits. the custom serialization worked fine, but broke
other parts of the code when the header bitset had new bits turned on, thus growing the header's
size, data segment bytes were overwritten.
> the serialized version of a BitSet can grow in a similar manner, no pomises of size/consistency
are made, but with current use it luckily doesn't seem to happen.
> a good fix is unclear. without forcing the header to be a fixed/constant size in some
manner this problem could pop up at any point. it's generally not safe to rewrite headers
like this without custom code that ensures the size doesn't change. one fix would be to manually
write all of the header data out (rather than relying on java serialization and serialization
code in other parts of cassandra not to change.) another might be to pad the size of the header
so that the data inside can grow, but that seems fraught with (potential) problems. (i've
played around with padding the header length, but that seems to cause other things to break,
which i haven't been able to track down yet.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message