cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
Date Fri, 07 Aug 2015 09:40:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661560#comment-14661560
] 

Benedict commented on CASSANDRA-8630:
-------------------------------------

It would be better, from that perspective to use native order. However:

# Swapping byte order is a single cycle instruction, so is probably not a big deal, given
how inefficient we are otherwise right now
# We would have to require that users have nodes all sharing the same architecture (or at
least endianness), which may or may not be limiting. Or we could pick the common platform
endianness and go little endian
# We would have to support both at least during upgrade

All told I think it's easier to stick with big endian format as is Java standard and standard
across our codebase. But I'm not opposed to the alternative of providing a patch to support
both, followed by an upgrade period and dropping support for big endian in 4.0



> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot of CPU is
lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as their matching
write* are implemented with numerous calls of byte by byte read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in either way gives
8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and SequencialWriter.write<Type>
methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and ColumnNameHelper.maxComponents,
which were on my profiler's hotspot method list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% faster  on uncompressed
sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU load - i.e.
compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message