cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables
Date Tue, 12 Jun 2012 19:09:44 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293852#comment-13293852
] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------


V2 attached based on the review + some test related change.

bq. Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired,
similar to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR

CDIS now implements InputStream only and renamed to CompressedInputStream.

bq. Why the changes to OutboundTcpConnection?

The changes are made in order to obtain nio.SocketChannel, socket has to be created using
SocketChannel.open.

bq. Re MS changes: when would header.file be null?

When a node requests range but target node doesn't have corresponding data. I reverted the
change in MS to send at least send streaming header when header.file is null. It seems redundant
but for now, it's necessary to terminate stream session of requesting node.

bq. Chunk[] sort can use Guava Longs.compare

done.

bq. I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication
strategy, so may not be sorted?) Instead of using Set + copy into array, why not use an ArrayList
+ trimToSize()

The reason why I use Set here is to eliminate duplicate chunks. Given two different file section
can be mapped to just one chunk.

bq. is the FST comment // TODO just use a raw RandomAccessFile since we're managing our own
buffer here obsolete? is the CompressedRandomAccessReader path used at all in FST anymore?

I removed CRAR from FST in v2. Even if nio is not available (in case of inter-node SSL), streaming
uses CompressedFileStreamTask with socket's InputStream to transfer file directly.

{quote}
Nit: avoid double negation in if statements with else clauses
Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer
and ChunkParametersSerializer classes, respectively, just to make the code discoverable for
re-use later
{quote}

done.

bq. Should we make nio transfer the default for uncompressed sstables as well, and add an
option to enable compression? Alternatively, now that compression is the default for new sstables,
I'd be okay with removing LZF stream compression entirely

I don't do any benchmark, but I think always using LZF compression is fine when transferring
uncompressed data.

bq. Does this over-transfer data on chunk boundaries? Put another way, do we stream data that
doesn't actually belong on the target node? (I'm okay with this, just want to be clear about
what's happening.)

Source node can send unrelated range of data inside chunk, but receiving node ignores (or
skips) that part when reading from socket, so, the answer is no.

                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform
zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable
compression and internode encryption we had to switch to java IO Input/OutputStreams. What
we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in
SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF
as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to
transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message