cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10990) Support streaming of older version sstables in 3.0
Date Fri, 05 Feb 2016 16:19:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134391#comment-15134391
] 

Paulo Motta commented on CASSANDRA-10990:
-----------------------------------------

Initial version is ready for review. Feedback on approach and correctness will be greatly
appreciated.

*Patch Overview*

The patch adds support for streaming pre-3.0 sstables and a comprehensive test suite around
it. Adding support to non-static-compact tables was simple, basically wokaround the lack of
serialization header by using a header with no stats and deserialize clustering prefix with
old format deserializer while serializing in new format.

The main challenge was to provide support to streaming compact static tables, because in the
new format the static columns must be the first columns in a partition while in the previous
format they can be in any position of the partition. This means that each partition must be
traversed to search for static columns and then rewinded to search for remaining non-static
columns.

In order to solve this I added a new {{CachedInputStream}} that adds mark/reset functionality
to a source stream and allows to cooperatively cascade multiple {{CachedInputStream}} with
different capacities to create an input stream cache hierarchy. For instance, I used this
feature on {{StreamDeserializer}} for pre-3.0 sstables that uses a {{MemoryCachedInputStream}}
that falls back to a {{FileCachedInputStream}} when it runs out of capacity in memory. The
{{FileCachedInputStream}} may write a temporary buffer file to a data directory and remove
it once the file is successfully streamed or if it fails.

This approach allow us to use the {{OldFormatDeserializer}} transparently, and the same code
path for reading pre-3.0 sstables is used to stream pre-3.0 sstables. Note that the {{CachedInputStream}}
is only used to stream pre-3.0 sstables in order to provide rewind functionality and will
not affect existing behavior.

Please note that performance was not the objective here, but mostly support streaming functionality
of pre-3.0 sstables. Compact static tables may suffer a slight performance hit due to buffer
copying and rewinding, but non-compact static tables will not have performance affected since
the stream cache will not be used.

*Tests*

* *Unit tests*: Extended {{LegacySStableTest}} to test streaming of legacy compact sstables
since jb version.
** Add comprehensive test suite for different {{CachedInputStream}} variants on {{RewindableDataInputStreamPlusTest}}
* *SStable loader dtests*: Extended {{sstable_generation_loading_test}} to sstableload 2.1
(ka) sstables with different compression settings.
* *Upgrade dtests*: Extended CASSANDRA-10563 upgrade dtests to bootstrap soon after upgrading,
to test bootstrap streaming of legacy sstables.

*TODO*

* Cleanup of leftover buffer files on startup.
* Improve documentation of {{CachedInputStream}}, {{MemoryCachedInputStream}} and {{FileCachedInputStream}}
* Make max memory buffer size a system property and change it on dtests
* {{LegacySSTableTest}} passes when executed individually but fails when executed on a suite,
probably some leftovers from previous test that need to be cleaned up.
* Add la sstables to {{sstable_generation_loading_test}}
* Fix {{upgrade_8099_test.py:TestBootstrapAfterUpgrade.upgrade_with_wide_partition_test}}

||3.0||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:10990]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:10990]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-dtest/lastCompletedBuild/testReport/]|

[~philipthompson] when you have time, could you please setup a custom dtest run with the dtest
branch above? Thanks!

> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables (CASSANDRA-5772).
 In 3.0, because of the rewrite of the storage layer, this became no longer supported.  So
currently, while 3.0 can read sstables in the 2.1/2.2 format, it cannot stream the older versioned
sstables.  We should do some work to make this still possible to be consistent with what CASSANDRA-5772
provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message