cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1902) Migrate cached pages during compaction
Date Wed, 30 Mar 2011 00:02:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012783#comment-13012783
] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

Catching up with ticket history and the latest version of the patch, a few things based on
the history+patch themselves (I have not tested or benchmarked anything):

With respect to avoiding waiting on GC: the munmap() is still in finalize() we we're still
waiting on GC, right? Just not on every possible ByteBuffer (instead only on the MappedFileSegment
itself).

BufferedSegmentedFile.tryPreserveFilePageCache() is doing a tryPreserveCacheRegion() for every
page considered hot. The first thing to be aware of then is that this will translate into
a posix_fadvise() syscall for every page, even when all or almost all pages are in fact in
memory. This may be acceptable, but keep in mind that use-cases where all or almost all pages
are in cache, are likely to be the ones CPU-bound rather than disk bound.

The bigger issue with the same thing, is that in the cache of large column families that we're
trying to optimize for, unless I am missing something the preservation process is expected
to be entirely seek bound for sparsely hot sstables. In the best case for mostly-hot sstables
it might not be seek bound provided that pre-fetching and/or read-ahead and/or linear access
detection is working well, but that seems very dependent on system details and the type of
load the system is under (probably less likely to work well under high "live" read i/o loads).
In the non-best case (sparsely hot), it should most definitely be entirely seek bound.

fadvising entire regions at once instead of once per page might improve that, but I still
think the better solution is to just not DONTNEED hot data to begin with (subject to potential
limitations to avoid too frequent DONTNEEDs).

Note: The original motivation for avoiding frequent DONTNEED was performance in relation to
the syscall. But in this case we're taking a "one syscall per page" hit anyway with the WILLNEED:s.
In fact in the case of a very hot sstable (where CPU efficiency is more important than a cold
sstable where disk I/O is more important) the WILLNEED:s should be more numerous than the
DONTNEED:s would have been had they been "fragmented" according to a hotness map.

Disregarding the CPU efficiency concerns though, the primary concern I'd have is the WILLNEED
calls. Again I haven't tested to make sure I'm not mis-reading it, but this should mean that
all compactions of actively used sstables will end, after the streaming I/O, with lots of
seek bound reads to fullfil the WILLNEED:s. This can take a lot of time and be expensive in
terms of the amount of "disk time" being spent (relative to a rate limited compaction process),
and also violates the otherwise preserved rule that "the only seek-bound I/O is live reads;
all other I/O is sequential".

Also: If WILLNEED blocks until it's been read, the impact on live traffic should be limited
but on the other hand latency should be high under read load. If WILLNEED doesn't block throughput
should have a chance of being reasonable by maintaining some queue depth, but on the other
hand would potentially severely affect live reads. (I don't know which is true, I should check,
but I haven't yet.)

Minor nit: Seemingly truncated doc string for SegmentedFile.complete().

Minor suggestion: Should isRangeInCache() be renamed to wasRangeInCache() to reflect the fact
that it does not represent current status? It is not an implementation detail because if it
did reflect current reality, the caller would be incorrect (the test on a per-column basis
would constantly give false positives as being in cache due to (1) the column just having
been serialized, which would be easily fixable, but also because (2) previous columns on the
same page, which is more difficult to fix than moving a line of code).



> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt,
1902-formatted.txt, 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt,
CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a pre-compacted
CF during the compaction process.  This is now important since CASSANDRA-1470 caches effectively
nothing.  
> For example an active CF being compacted hurts reads since nothing is cached in the new
SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from active CFs.
This can be done my monitoring which Old SSTables are in the page cache and caching active
rows in the New SStable.
> A simpler yet similar approach is described here: http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message