cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
Date Wed, 17 Dec 2014 15:46:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250014#comment-14250014
] 

Branimir Lambov commented on CASSANDRA-8464:
--------------------------------------------

Very impressive performance gain. My comments:
- The patch is in effect introducing a new {{CompressedRandomAccessReader}} implementation
that uses memory mapping, in addition to the existing one that uses on-heap buffers. Rather
than separating the two using multiple if/else checks inside the code, wouldn't this be structurally
clearer if we created separate subclasses for the two? The latter is also more efficient as
it replaces ifs with virtual calls which the JIT is better equipped to handle; the JRE will
never see the irrelevant code, and space will not be wasted for irrelevant fields.
- Will the CRAR be used with >=2G files very often? If the <2G case is predominant it
would make sense to further separate the code in a <2G-optimized class with single mmap
buffer in addition to the current {{TreeMap}}-based catch-all.
- I'd prefer not to refer to {{sub.nio.ch.DirectBuffer}} throughout code; its references should
be confined to the {{FileUtils}} class, for example by changing {{FileUtils.clean()}} to take
a {{ByteBuffer}} argument and do the conversion or skip cleaning for non-direct ones. Maybe
we should also move the {{isCleanerAvailable()}} check into it (it should be easily optimized
away by the JIT) and make cleaning a single call rather than the isDirect, isCleanerAvailable,
cast sequence it is now.
- Nit on {{FileUtils.clean()}} uses: since {{isCleanerAvailable()}} does not change value,
it should be the first thing tested in all ifs with more than one condition. This makes the
JIT's job easier.
- {{CRAR.allocateBuffer}}: for Snappy {{uncompress(ByteBuffer)}} both buffers need to be direct.
You could perhaps revive parts of CASSANDRA-6762 to deal with this (and support {{DeflateCompressor}}).
- CRAR static init: {{FBUtilities.supportsDirectChecksum()}} could return true initially,
but revert to false at the first invoke attempt. The choice for useMmap is made before that
happens. Perhaps we could do a test invoke in the static init instead of letting the value
change later?
- {{FBUtilities.directCheckSum}}: Does the checksum-on-ByteBuffer trick work with non-Oracle
JVMs? Have we tested what happens on one?
- {{FBUtilities.directCheckSum}}: In the fallback case, are we sure we never change buffers'
byte order? ({{getInt()}} might return swapped bytes if not) A chunked loop as done in CASSANDRA-6762
is safer and could be faster. (If you worry about the allocation, we could probably provide
a thread-local scratch array in {{FBUtilities}} or similar.)
- {{LZ4Compressor::uncompress}}: No need for {{hasArray()}} checks, decompressor will do this
if it helps (it doesn't normally). You shouldn't need to duplicate any of the buffers or copy
into byte array, just use buffer's {{get(position + ...)}}.
- The new compressor and checksum functionality should be unit-tested, as well as all fallbacks;
CASSANDRA-6762 has some tests you could reuse.


> Support direct buffer decompression for reads
> ---------------------------------------------
>
>                 Key: CASSANDRA-8464
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>              Labels: performance
>             Fix For: 3.0
>
>
> Currently when we read a compressed sstable we copy the data on heap then send it to
be de-compressed to another on heap buffer (albeit pooled).
> But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte
buffers.   This lets us mmap the data and decompress completely off heap (and avoids moving
bytes over JNI).
> One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's
also in java 7 but marked private?!)
> This change yields a > 10% boost in read performance on cstar.  Locally I see upto
30% improvement.
> http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=200.09&ymin=0&ymax=135908.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message