cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
Date Tue, 13 Dec 2011 23:25:30 GMT


Sylvain Lebresne commented on CASSANDRA-3143:

bq. Preceding point apart, we would at least need a way to deactivate row caching on a per-cf
basis. We may also want that for key cache, though this seems less critical. My initial idea
would be to either have a boolean flag if we only want to allow disabling row cache, or some
multi-value caches option that could be "none", "key_only", "row_only" or "all".

This is going to be moved to the separate task.

I'm not a fan of that idea. We just cannot release this without a way to deactivate the row
cache as this would make the row cache unusable for most users. IMHO, that's a good definition
of something that should not be moved to a separate task.

bq. Why does the getRowCacheKeysToSave() option disappeared?

Because we don't control that anymore, rely on cache LRU policy instead.

I don't understand how "relying on cache LRU policy" has anything to do with that. The initial
motivation for that option is that people don't want to reload the full extend of the row
cache on restart because it takes forever, but they don't want to start with cold caches either.
I don't see how making the cache global changes anything on that. I agree that the number
of row cache key to save should now be a global option, but I disagree that it should be removed.

* The code around CFS.prepareRowForCaching is weird. First the comment seems to suggest that
prepareRowForCaching is used exclusively from CacheService while it's use below in cacheRow.
It also adds a copy of the columns which I don't think is necessary since we already copy
in MappedFileDataInput.  Overall I'm not sure prepareRowForCaching is useful and CacheService.readSavedRowCache
could use cacheRow directly
* I don't think CacheService.reloadKeyCache works correctly. It only populate the cache with
fake values that won't get updated unless you reload the sstables, which has no reason to
happen. I'm fine removing the key cache reloading altogether, but as an alternative, why not
save the value of the key cache too? The thing is, I'm not very comfortable with the current
'two phase' key cache loading: if we ever have a bug in the SSTReader.load method, the actual
pre-loading with -1 values will be harmful, which seems unnecessarily fragile. Saving the
values on disk would avoid that.
* Talking of the key cache save, the format used by the patch is really really not compact.
For each key we save the full path to the sstable, which can easily be > 50 bytes. Maybe
we could associate an int to each descriptor during the save and save the association of descriptor
-> id separately.  * Still worth allowing to chose how may keys to save
* The cache sizings don't take the keys into account. For the row cache, one could make the
argument that the overhead of the keys is negligible compared to the values. For the key cache
however, the key are bigger than the values.
* The patch mistakenly remove the help for 'nodetool upgradesstables' (in
* Would be worth adding a global cache log line in StatusLogger.
* Patch wrongly reintroduces memtable_operations and memtable_throughput to CliHelp.
* The default row cache provider since 1.0 is the serializing one, this patch sets the ConcurrentLinkedHashCacheProvider

And a number of nits:
* In CFS, it's probably faster/simpler to use metadata.cfId rather than Schema.instance.getId(,
* In CacheService, calling scheduleSaving with -1 as second argument would be slightly faster
than using Integer.MAX_VALUE.
* In SSTableReader.cacheKey, the assert {{key.key == null}} is useless in trunk (DK with key
== null can't be constructed).
* In AbstractCassandraDaemon, there's a unecessary import of
* There is some comments duplication in the yaml file
* I wonder if the reduce cache capacity thing still makes sense after this patch?
* In AutosavingCache, I think we could declare AutoSavingCache<K extends CacheKey, V>
and get rid of the translateKey() method

> Global caches (key/row)
> -----------------------
>                 Key: CASSANDRA-3143
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>              Labels: Core
>             Fix For: 1.1
>         Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch,
0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch,
0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch
> Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables
were difficult pre-CASSANDRA-2006.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message