cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7404) Use direct i/o for sequential operations (compaction/streaming)
Date Mon, 10 Nov 2014 14:57:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204853#comment-14204853
] 

Ariel Weisberg commented on CASSANDRA-7404:
-------------------------------------------

It should be possible to do this and emit roughly the same patterns of I/O and memory usage
as going through the page cache. It's a little tricky in the case where many thousands of
tables have to be merged at once due to pooling memory to service reads for each file. It
looks like the current code allocates 64kb of on-heap memory. When I asked the worst case
is something like reading 32k files concurrently and at 64k per file that is 2 gigabytes of
on-heap memory. If I up that to 2 megabytes to emit the right size IOs for spinning disk it
will be a problem.

If I start shrinking IOs I think performance will degrade because the page cache behavior
is slightly different. During compaction I'll bet some files are hotter then others and the
kernel will perform read-ahead normally even when thousands of files are read concurrently
because the data that is in-memory is a cache and can be re-flowed to fit the usage. When
someone is merging that many files that is probably not a great time to find out that the
performance degrades so testing the worst case has to be on the todo list.

I am going to start by implementing a FileChannel wrapper that switches the file to fd to
O_DIRECT and retains an internal buffer to service reads so I can control the size of IOs
and alignment. Then I'm going to work on making sure the buffer sizing degrades gracefully
when thousands of these wrappers are instantiated so memory usage is similar to the existing
one.

If I can hit the compaction throughput targets at that point I'll stop otherwise it might
be helpful to double buffer and have a separate thread prefetch so that compute and IO can
overlap. 


> Use direct i/o for sequential operations (compaction/streaming)
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-7404
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7404
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
>
>
> Investigate using linux's direct i/o for operations where we read sequentially through
a file (repair and bootstrap streaming, compaction reads, and so on). Direct i/o does not
go through the kernel page page, so it should leave the hot cache pages used for live reads
unaffected.
> Note: by using direct i/o, we will probably take a performance hit on reading the file
we're sequentially scanning through (that is, compactions may get slower), but the goal of
this ticket is to limit the impact of these background tasks on the main read/write functionality.
Of course, I'll measure any perf hit that is incurred, and see if there's any mechanisms to
mitigate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message