cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "graham sanderson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
Date Tue, 22 Jul 2014 17:28:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070533#comment-14070533
] 

graham sanderson edited comment on CASSANDRA-7546 at 7/22/14 5:27 PM:
----------------------------------------------------------------------

Well that makes sense, I hadn't checked if there was a limit on mutator threads - we didn't
change it... this probably explains the hard upper bound in my synthetic test (which incidentally
does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone operation has to
acquire a lock (or at least wait for no mutations)... I surmised there were probably dragons
there that might cause all kinds of nastyness whether it be pain on concurrent updates to
horribly unbalancing updates to the tree, or dragging huge amounts of stale data (garbage
to be) with it due to overly lazy copy on write (again I didn't look too closely). BTree looks
much better (and probably does less rebalancing since it has wider nodes I think), though
as discussed it doesn't prevent the underlying race we also want to avoid

So, I'll see if I have time to work on this later today, but the plan is... for 2.0.x (just
checking)

a) move the transformation.apply out of the loop or do it at most once (I prefer the former
since it makes the race window smaller)
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped when a cost reaches
a certain value. I was going to calculate the delta in each mutator thread (probably adding
a log-like measure e.g. using Integer.numberOfLeadingZeros(tree.size()) per failing CAS),
though looking ugh at SnapTreeMap again, it seems that tree.size() is not a good method to
call in the presence of mutations, so I guess Holder can just track the tree size itself.
Note the new total cost is CASed in as part of the holder
c) given this is possibly a temporary solution, is it worth exposing the "cut-off" value even
un-documented such that it could be overridden in cassandra.yaml? Note the default should
be such that most AtomicSortedColumns instance never get cut-off since they are not heavily
contended and large (indicating contended inserts not updates)

We can re-circle the wagons for 2.1 which needs to be a separate patch anyway



was (Author: graham sanderson):
Well that makes sense, I hadn't checked if there was a limit on mutator threads - we didn't
change it... this probably explains the hard upper bound in my synthetic test (which incidentally
does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone operation has to
acquire a lock (or at least wait for no mutations)... I surmised there were probably dragons
there that might cause all kinds of nastyness whether it be pain on concurrent updates to
horribly unbalancing updates to the tree, or dragging huge amounts of stale data (garbage
to be) with it due to overly lazy copy on write (again I didn't look too closely). BTree looks
much better (and probably does less rebalancing since it has wider nodes I think), though
as discussed it doesn't prevent the underlying race we also want to avoid

So, I'll see if I have time to work on this later today, but the plan is... for 2.0.x (just
checking)

a) move the transformation.apply out of the loop or do it at most once (I prefer the former
since it makes the race window smaller)
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped when a cost reaches
a certain value. I was going to calculate the delta in each mutator thread (probably adding
a log-like measure e.g. using Integer.numberOfLeadingZeros(tree.size()) per failing CAS),
though looking ugh at SnapTreeMap again, it seems that tree.size() is not a good method to
call in the presence of mutations, so I guess Holder can just track the tree size itself
c) given this is possibly a temporary solution, is it worth exposing the "cut-off" value even
un-documented such that it could be overridden in cassandra.yaml? Note the default should
be such that most AtomicSortedColumns instance never get cut-off since they are not heavily
contended and large (indicating contended inserts not updates)

We can re-circle the wagons for 2.1 which needs to be a separate patch anyway


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, then CAS the
state of the partition.
> Under heavy contention for updating a single partition this can cause some fairly staggering
memory growth (the more cores on your machine the worst it gets).
> Whilst many usage patterns don't do highly concurrent updates to the same partition,
hinting today, does, and in this case wild (order(s) of magnitude more than expected) memory
allocation rates can be seen (especially when the updates being hinted are small updates to
different partitions which can happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation whilst not
slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message