cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Goffinet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1902) Migrate cached pages during compaction
Date Fri, 06 May 2011 20:37:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030125#comment-13030125
] 

Chris Goffinet commented on CASSANDRA-1902:
-------------------------------------------

Peter,

In the modifications I did for 1902, when we open up SSTable files for compaction, I run mincore
across the file to determine which pages are hot already using the hooks in 1902. I use this
information in rebuffer() to call DONTNEED on pages that were cold to begin with after a read()
call. A percentage of hints given to pages is being ignored by the kernel. Since that page
is now 'hot', when we need to mark the hot pages for the new SSTable, we migrate more than
we need. For example, we did a test where we made sure that on memtable flushing, we called
DONTNEED on entire file. We verified the flushed files were not in cache. Then when compaction
kicked in, since all pages were cold, we should have new SSTables that are not in cache. What
we observed was, the final file after a large series of flushes + compaction, ended up being
50% in page cache over a long period of time. Even we purposely told the OS we don't want
the pages in cache (as we read them).

Jake:

So the problem with that approach is that we still need to make sure as we read data from
disk, if the page is cold, it stays cold. Keeping statistics helps the approach of not migrating
pages that were cold to hot, but since we still have to read the file during compaction we
still need to call DONTNEED on pages that were cold to begin with. That is what is causing
the issue, we know a page is cold up front, but the kernel is not respecting that DONTNEED.
I thought it might be related to READ AHEAD, so I made sure to fadvise FADV_RANDOM, so that
wasn't the issue either.

Jonathan:

Yeah we run with CASSANDRA-2156, that's helped us a lot for performance consistency. We have
certain workloads that need to read data recently written, so we disabled calling posix_fadvise(fd,
0, 0) during memtable flushes. We actually found writing new data, and just letting the kernel
manage the pages worked better than 1902 solution, because we were calling WILLNEED on pages
that were never being read to begin with.

One last thing to try would be keeping track of the last (offset, length) so after read()
call fadvise on the previous pair instead of what I just read. The ignoring of hints might
be related to the current refcount. I will try this out tonight and update the ticket.

> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt,
1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 1902-per-column-migration-rebase2.txt,
1902-per-column-migration.txt, CASSANDRA-1902-v10-trunk-rebased.patch, CASSANDRA-1902-v3.patch,
CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, CASSANDRA-1902-v6.patch, CASSANDRA-1902-v7.patch,
CASSANDRA-1902-v8.patch, CASSANDRA-1902-v9-trunk-rebased.patch, CASSANDRA-1902-v9-trunk-with-jmx.patch,
CASSANDRA-1902-v9-trunk.patch, CASSANDRA-1902-v9.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a pre-compacted
CF during the compaction process.  This is now important since CASSANDRA-1470 caches effectively
nothing.  
> For example an active CF being compacted hurts reads since nothing is cached in the new
SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from active CFs.
This can be done my monitoring which Old SSTables are in the page cache and caching active
rows in the New SStable.
> A simpler yet similar approach is described here: http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message