cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables
Date Thu, 14 Nov 2013 17:07:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822623#comment-13822623
] 

Tyler Hobbs commented on CASSANDRA-5519:
----------------------------------------

bq. What is the relationship between BASE_SAMPLING_LEVEL and MIN_SAMPLING_LEVEL with indexInterval?

{{BASE/MIN_SAMPLING_LEVEL}} are orthogonal to {{indexInterval}}.  {{BASE_SAMPLING_LEVEL}}
essentially sets the granularity at which you can down/upsample.  {{MIN_SAMPLING_LEVEL}} sets
a limit on how low you can downsample.  (I'll note that we could potentially raise {{indexInterval}}
alongside these changes in order to have more summary entries for hot sstables.)

bq. How many rows do we get for 5% of a 8GB heap?

That gives us ~410 MiB to work with.  If we assume the average key length is 8 bytes, each
summary entry uses 20 bytes of space, giving us ~21 million summary entries.

At full sampling, that's 21MM * 128 = 2.7 billion rows, assuming no overlap across sstables.
At minimum sampling, that's ~11 billion rows.

If the avg key size is 16 bytes, that drops to ~2 and ~8 billion rows.

bq. Isn't it a minor bug to just ignore compacting sstables? Suggest reducing memory pool
to allocate to the uncompacting ones, by the amount allocated to the compacting ones.

Good point, I agree.

bq. Could we just resample at compaction time instead of dealing with refcounting or locking?
That probably gives up too much of the potential benefits.

Yeah, that would probably be okay for small sstables that are compacted frequently, but the
large sstables would be tuned poorly, and those make up the majority of the memory use.

bq. I think we could make it almost as elegant by using the datatracker replace mechanism
originally for compaction, to build a new SSTR and swap it in w/o extra concurrency controls.

That's a good idea; I think it would be fairly clean.  I'll give that a shot.

bq. Is the idea behind touching it in DD to force the mbean to be loaded, or is there a circular
dependency that breaks w/o that?

Neither the {{IndexSummaryManager}} singleton nor the mbean are loaded without that.  No other
classes use the {{IndexSummaryManager}},
so the static fields are never initialized.  (Just importing the classes doesn't seem to trigger
the class loader.)

> Reduce index summary memory use for cold sstables
> -------------------------------------------------
>
>                 Key: CASSANDRA-5519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message