cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10990) Support streaming of older version sstables in 3.0
Date Wed, 10 Feb 2016 21:34:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141660#comment-15141660
] 

Paulo Motta edited comment on CASSANDRA-10990 at 2/10/16 9:33 PM:
------------------------------------------------------------------

Thanks for the comments [~yukim].

bq. What's the difference between MemoryCachedInputStream and BufferedInputStream? 

The main difference between {{MemoryCachedInputStream}} and {{BufferedInputStream}} is that
the former has the ability to mark/reset a parent/source stream when it runs out of capacity
without losing its mark state, allowing us to cascade a {{FileCachedInputStream}} with a {{MemoryCachedInputStream}}
to provide a multi-tiered cached input stream.

Another less relevant difference is that {{BufferedInputStream}} always does buffered reads
of up to the capacity of its buffer, while {{MemoryCachedInputStream}} only buffer reads when
it's marked and only the amount that was consumed via its {{read}}/{{skip}} methods.

bq. Why can't we use the latter? 

I tried extending {{BufferedInputStream}} to add the ability to mark a parent stream when
it runs out of capacity, but that involved reimplementing and/or changing most of its methods
since {{BufferedInputStream}} always reads from its internal buffer and re-fills it when necessary
and most of its methods rely on that logic. Reading from a parent stream when the buffer is
full would change this assumption what would require a significant refactor in most of its
methods. I'm open to suggestions if you see a way of easily adapting {{BufferedInputStream}}
to fulfil that requirement.

bq. {{MemoryCachedInputStream}} uses default {{ByteArrayOutputStream}} constructor which has
only size of 32 bytes. Isn't this too small to use for cache?

Probably, I will try to find a better value for this. Do you easily remember if there is a
way to retrieve the average partition size for a given table? I remember seeing something
along those lines but I'm not sure where it is..

I will start work on the remaining TODO points and review comments. Please let me know if
you have something to add.


was (Author: pauloricardomg):
Thanks for the comments.

bq. What's the difference between MemoryCachedInputStream and BufferedInputStream? Why can't
we use the latter? 

The main difference between {{MemoryCachedInputStream}} and {{BufferedInputStream}} is that
the former has the ability to mark/reset a parent/source stream when it runs out of capacity
without losing its mark state, allowing us to cascade a {{FileCachedInputStream}} with a {{MemoryCachedInputStream}}
to provide a multi-tiered cached input stream. 


> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables (CASSANDRA-5772).
 In 3.0, because of the rewrite of the storage layer, this became no longer supported.  So
currently, while 3.0 can read sstables in the 2.1/2.2 format, it cannot stream the older versioned
sstables.  We should do some work to make this still possible to be consistent with what CASSANDRA-5772
provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message