cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1902) Migrate cached pages during compaction
Date Wed, 30 Mar 2011 21:19:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013659#comment-13013659
] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

Regarding drop caches: Right, I don't remember whether the echo will block until eviction
is complete or not (in cases where it is slow, it should be CPU bound though and not imply
I/O). But I made sure that: (1) the echo terminated, (2) I got iostat running, (3) waited
for flurry of I/O to complete that is generated by background operations on any modern machine,
that you suddenly see when you do a *complete* buffer cache drop, and (4) saw it idle, and
(5) only then Cassandra began the pre-population.

So hopefully that part of the test should be kosher.

Regarding mere mortals ;) Sorry. Is it the iostat stuff which is unclear? I'm looking at (in
a monospaced font btw, for alignment...) the avgqu-sz column which indicates the average number
of outstanding I/O requests for the sampling duration (1 second in this case). This is effectively
the "queue depth".

There are usually two main interesting things about "iostat -k -x 1" (-x being key). One is
utilization, which shows the percentage of time there was *any* outstanding I/O request to
the device. (But one has to interpret it in context; for example a RAID0 of 10 disks can be
at "100%" utilization yet be only 10% saturated.) The other is the average queue size, which
is a more direct indication of how any concurrent requests are being serviced.

In the case of the 10 disk RAID0, 100% utilization with an average queue size of 1 would mean
roughly 10% saturation of underlying disks. 100% utilization with an average queue size of
5 would mean roughly 50% saturation of underlying disks.

The other relevance of the average queue size is on latency. Disregarding potential relative
prioritization going on, if the average number of outstanding requests is, say, 10 - any single
request will typically have to wait for 10 other requests to be servied first. (But again
that has to be interpreted in context; if you have 10 constitutent disks in a RAID0, that
10 is effectively 1 for latency purposes)

So, when judging the expected effects on the latency (and throughput) of "live reads", it's
interesting to look at these values.

In particular, consider the simple case of a single seek-bound serial reader. If the average
queue depth is 5, this single reader would probably see a throughput roughly 1/5 of normal
(I"m assuming otherwise identical I/O in terms of size of requests). A practical example is
something like a "tar -czvf" that is reading a lot of small files (fs meta data etc).

So in that sense, a constant pressure of 5 outstanding requests will cause a very significant
slow-down to the serial reader.

On the other hand, if instead of having a serial read you have N number of concurrent readers
- you would now rather expect a throughput more like N/(N + 5 - 1) of normal. As the concurrency
of the interesting I/O increases, the 5 extra makes less of a difference.

You tend to reach an interesting equilibrium here. Suppose you serve some amount of requests
per second normally, and that it gives rise to an average queue depth of 0.5. Now add the
constant background pressure of 5 requests. Assuming the reads (that normally gave rise to
the 0.5 queue depth) are *independent* (I.e., added latency to one does not prevent the next
one from coming in), what tends to happen is that you start accumulating outstanding requests
until the number of concurrent requests is high enough that you reach the throughput you had
before. Only instead of 0.5 average concurrency, you have something higher than 5. Whatever
is required to "drown out" the extra 5 enough.

Even if you are able to reach your desired throughput (requests per second) like this, it
significantly adds to the average latency of each read. Not only will each read have to contend
with the extra 5 background I/O operations always pending, they also have to compete with
other concurrent "live" requests.

> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt,
1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 1902-per-column-migration-rebase2.txt,
1902-per-column-migration.txt, CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch,
CASSANDRA-1902-v6.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a pre-compacted
CF during the compaction process.  This is now important since CASSANDRA-1470 caches effectively
nothing.  
> For example an active CF being compacted hurts reads since nothing is cached in the new
SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from active CFs.
This can be done my monitoring which Old SSTables are in the page cache and caching active
rows in the New SStable.
> A simpler yet similar approach is described here: http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message