cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
Date Mon, 27 Jul 2015 09:19:06 GMT


Benedict commented on CASSANDRA-8630:

I think we can avoid static methods by making {{MemoryInputStream}} extend {{ByteBufferInput}}
or {{NIODataInputStream}} (this is actually pretty easy, since we can construct {{ByteBuffer}}
from a {{Memory}} instance at very low cost, and can "refill"" that {{ByteBuffer}} on a {{readNext}}.

I think this should ideally be based on {{NIODataInputStream}} if that's not too challenging
(which I hope it should not be), so that we can keep one main implementation of {{DataInputPlus}}.
And we can indeed drop {{AbstractDataInput}} at that point.
A somewhat-circular buffer would be ideal. However we cannot currently have > 64Kb buffers,
so for reads using 64Kb already we would have to shrink our read size or grow our max buffer
size. For smaller buffers, we will still have to shuffle bytes around and have quite a bit
of wasted allocation (for this to work, our reads of objects much smaller than 4Kb would still
need 8Kb buffers allocated).

I wonder if for primitive reads we shouldn't just have a slow path (like we've now opted for
with vints) that reads byte-by-byte at a boundary crossing (returning a long that can be cast).
If we force the method call to _not_ be inlined, the cost should be imperceptible with the
help of branch prediction, since it should almost never be hit (given we probabilistically
guarantee covering the entire range in our buffer), and will not pollute the icache. This
has the added bonus of being much easier to understand, and probably easier to implement safely.

However if you think you can deliver a neat circular-buffer approach, feel free to give it
a try.

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>                 Key: CASSANDRA-8630
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, flight_recorder_001_files.tar.gz
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot of CPU is
lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as their matching
write* are implemented with numerous calls of byte by byte read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in either way gives
8x speed increase.
> A patch attached implements<Type> and SequencialWriter.write<Type>
methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and ColumnNameHelper.maxComponents,
which were on my profiler's hotspot method list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% faster  on uncompressed
sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU load - i.e.
compaction; yellow is user - i.e. not compaction related tasks)

This message was sent by Atlassian JIRA

View raw message