cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
Date Wed, 05 Aug 2015 10:34:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655149#comment-14655149
] 

Benedict edited comment on CASSANDRA-8630 at 8/5/15 10:33 AM:
--------------------------------------------------------------

bq. For the fast path, the built-in BB methods should still be faster, right?

Right.

bq. readByte() would result in one unsafe get call per byte.

The unsafe calls here are all intrinsics, but still - even for fully inlined method calls
and unrolled loop we're talking something like 24x the work, but then we have the virtual
invocation costs involved (the behaviour of which for a sequence of 8 identical calls I'm
not certain - I would hope there is some sharing of the method call burden through loop unrolling,
but I don't count on it), and we are probably 100x+ using rigorous finger-in-air maths.

bq. Do we care about little-endian ordering as well

Good point. I think we may actually depend on this in MemoryInputStream for some classes that
were persisted in weird byte order. However I'm tempted to say we should start pushing this
upstream to the places that care, as there are few, and we basically consider them all broken.
It's not the first time this has caused annoyance. I'd be tempted to just forbid it, and patch
up the few places that need it.


was (Author: benedict):
bq. For the fast path, the built-in BB methods should still be faster, right?

Right.

bq. readByte() would result in one unsafe get call per byte.

The unsafe calls here are all intrinsics, but still - even for fully inlined method calls
and unrolled loop we're talking something like 24x the work, but then we have the virtual
invocation costs involved (the behaviour of which for a sequence of 8 identical calls I'm
not certain - I would hope there is some sharing of the method call burden through loop unrolling,
but I don't count on it), and we are probably 100x+ using rigorous finger-in-air maths.

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot of CPU is
lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as their matching
write* are implemented with numerous calls of byte by byte read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in either way gives
8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and SequencialWriter.write<Type>
methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and ColumnNameHelper.maxComponents,
which were on my profiler's hotspot method list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% faster  on uncompressed
sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU load - i.e.
compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message