cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables
Date Wed, 06 Jun 2012 21:03:23 GMT


Jonathan Ellis commented on CASSANDRA-4297:


- Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired, similar
to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR
- Why the changes to OutboundTcpConnection?
- Re MS changes: when would header.file be null?
- Chunk[] sort can use Guava
- I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication
strategy, so may not be sorted?)
- Instead of using Set + copy into array, why not use an ArrayList + trimToSize()
- is the FST comment {{// TODO just use a raw RandomAccessFile since we're managing our own
buffer here}} obsolete?
- is the CompressedRandomAccessReader path used at all in FST anymore?
- Nit: avoid double negation in if statements with else clauses, e.g. instead of
.           if (remoteFile.compressionInfo != null)
                dis = new CompressedDataInputStream(socket.getInputStream(), remoteFile.compressionInfo);
                dis = new DataInputStream(new LZFInputStream(socket.getInputStream()));
[           if (remoteFile.compressionInfo == null)
                dis = new DataInputStream(new LZFInputStream(socket.getInputStream()));
                dis = new CompressedDataInputStream(socket.getInputStream(), remoteFile.compressionInfo);
- Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer
and ChunkParametersSerializer classes, respectively, just to make the code discoverable for
re-use later

At a higher level,
- Should we make nio transfer the default for uncompressed sstables as well, and add an option
to enable compression?  Alternatively, now that compression is the default for new sstables,
I'd be okay with removing LZF stream compression entirely
- Does this over-transfer data on chunk boundaries?  Put another way, do we stream data that
doesn't actually belong on the target node?  (I'm okay with this, just want to be clear about
what's happening.)
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>                 Key: CASSANDRA-4297
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>         Attachments: 4297.txt
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform
zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable
compression and internode encryption we had to switch to java IO Input/OutputStreams. What
we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in
SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF
as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to
transfer compressed file as is using NIO instead of decompress/compress in source node.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message