cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
Date Wed, 19 Aug 2015 08:10:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702681#comment-14702681
] 

Benedict edited comment on CASSANDRA-8630 at 8/19/15 8:10 AM:
--------------------------------------------------------------

bq. RateLimiter is not a final class. 

I think Ariel was suggesting a new class that explicitly performs no work. However, since
we use this class more often for reads than we do for compaction, I would prefer we stick
with the more performant option of just null checking. Certainly using a full-fat RateLimiter
is more expensive than this

bq. Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and createSegments()

I should have written a bit about this before work started: my expectation is that this can
all be completely removed. The reason for it was that we treated each mmap file segment as
completely distinct, so we had to have each partition end before a 2G boundary (so we could
map the entirety), or we had to use a non-mmap segment. That's no longer the case, since we
just rebuffer, so we can safely eliminate all of the mess with segment boundaries, and just
map in increments of 2G (or, frankly, whatever we feel like. It might be nice to do it exactly
once when we "early open" so that we do not remap the same regions multiple times). At the
same time we can eliminate the idea of multiple segments; we should always have just one segment.
Given this, we should also consider renaming them, since they're no longer "segments" - they
cover the whole file.

(Caveat: I haven't reviewed the code directly, I'm just going off the comments)


was (Author: benedict):
bq. RateLimiter is not a final class. 

I think Ariel was suggesting a new class that explicitly performs no work. However, since
we use this class more often for reads than we do for compaction, I would prefer we stick
with the more performant option of just null checking. Certainly using a full-fat RateLimiter
is more expensive than this

bq. Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and createSegments()

I should have written a bit about this before work started: my expectation is that this can
all be completely removed. The reason for it was that we treated each mmap file segment as
completely distinct, so we had to have each partition end on a 2G boundary (so we could map
the entirety). That's no longer the case, since we just rebuffer, so we can safely eliminate
all of the mess with segment boundaries, and just map in increments of 2G (or, frankly, whatever
we feel like. It might be nice to do it exactly once when we "early open" so that we do not
remap the same regions multiple times). At the same time we can eliminate the idea of multiple
segments; we should always have just one segment. Given this, we should also consider renaming
them, since they're no longer "segments" - they cover the whole file.

(Caveat: I haven't reviewed the code directly, I'm just going off the comments)

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, flight_recorder_001_files.tar.gz,
flight_recorder_002_files.tar.gz, mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot of CPU is
lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as their matching
write* are implemented with numerous calls of byte by byte read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in either way gives
8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and SequencialWriter.write<Type>
methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and ColumnNameHelper.maxComponents,
which were on my profiler's hotspot method list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% faster  on uncompressed
sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU load - i.e.
compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message