Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A78DC18303 for ; Fri, 5 Feb 2016 16:19:40 +0000 (UTC) Received: (qmail 63362 invoked by uid 500); 5 Feb 2016 16:19:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 63292 invoked by uid 500); 5 Feb 2016 16:19:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 63236 invoked by uid 99); 5 Feb 2016 16:19:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2016 16:19:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 23B182C1F62 for ; Fri, 5 Feb 2016 16:19:40 +0000 (UTC) Date: Fri, 5 Feb 2016 16:19:40 +0000 (UTC) From: "Paulo Motta (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10990) Support streaming of older version sstables in 3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134391#comment-15134391 ] Paulo Motta commented on CASSANDRA-10990: ----------------------------------------- Initial version is ready for review. Feedback on approach and correctness will be greatly appreciated. *Patch Overview* The patch adds support for streaming pre-3.0 sstables and a comprehensive test suite around it. Adding support to non-static-compact tables was simple, basically wokaround the lack of serialization header by using a header with no stats and deserialize clustering prefix with old format deserializer while serializing in new format. The main challenge was to provide support to streaming compact static tables, because in the new format the static columns must be the first columns in a partition while in the previous format they can be in any position of the partition. This means that each partition must be traversed to search for static columns and then rewinded to search for remaining non-static columns. In order to solve this I added a new {{CachedInputStream}} that adds mark/reset functionality to a source stream and allows to cooperatively cascade multiple {{CachedInputStream}} with different capacities to create an input stream cache hierarchy. For instance, I used this feature on {{StreamDeserializer}} for pre-3.0 sstables that uses a {{MemoryCachedInputStream}} that falls back to a {{FileCachedInputStream}} when it runs out of capacity in memory. The {{FileCachedInputStream}} may write a temporary buffer file to a data directory and remove it once the file is successfully streamed or if it fails. This approach allow us to use the {{OldFormatDeserializer}} transparently, and the same code path for reading pre-3.0 sstables is used to stream pre-3.0 sstables. Note that the {{CachedInputStream}} is only used to stream pre-3.0 sstables in order to provide rewind functionality and will not affect existing behavior. Please note that performance was not the objective here, but mostly support streaming functionality of pre-3.0 sstables. Compact static tables may suffer a slight performance hit due to buffer copying and rewinding, but non-compact static tables will not have performance affected since the stream cache will not be used. *Tests* * *Unit tests*: Extended {{LegacySStableTest}} to test streaming of legacy compact sstables since jb version. ** Add comprehensive test suite for different {{CachedInputStream}} variants on {{RewindableDataInputStreamPlusTest}} * *SStable loader dtests*: Extended {{sstable_generation_loading_test}} to sstableload 2.1 (ka) sstables with different compression settings. * *Upgrade dtests*: Extended CASSANDRA-10563 upgrade dtests to bootstrap soon after upgrading, to test bootstrap streaming of legacy sstables. *TODO* * Cleanup of leftover buffer files on startup. * Improve documentation of {{CachedInputStream}}, {{MemoryCachedInputStream}} and {{FileCachedInputStream}} * Make max memory buffer size a system property and change it on dtests * {{LegacySSTableTest}} passes when executed individually but fails when executed on a suite, probably some leftovers from previous test that need to be cleaned up. * Add la sstables to {{sstable_generation_loading_test}} * Fix {{upgrade_8099_test.py:TestBootstrapAfterUpgrade.upgrade_with_wide_partition_test}} ||3.0||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:10990]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:10990]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-dtest/lastCompletedBuild/testReport/]| [~philipthompson] when you have time, could you please setup a custom dtest run with the dtest branch above? Thanks! > Support streaming of older version sstables in 3.0 > -------------------------------------------------- > > Key: CASSANDRA-10990 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10990 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Reporter: Jeremy Hanna > Assignee: Paulo Motta > > In 2.0 we introduced support for streaming older versioned sstables (CASSANDRA-5772). In 3.0, because of the rewrite of the storage layer, this became no longer supported. So currently, while 3.0 can read sstables in the 2.1/2.2 format, it cannot stream the older versioned sstables. We should do some work to make this still possible to be consistent with what CASSANDRA-5772 provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)