cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Dlugi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
Date Fri, 01 Sep 2017 07:42:02 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150172#comment-16150172
] 

Markus Dlugi commented on CASSANDRA-13754:
------------------------------------------

[~snazy], I don't think the node is overloaded. I originally thought so as well, so I made
a little experiment where I included a cap in our load test limiting the {{INSERT}} s per
minute from ~25,000 to ~10,000. As a consequence, the node survived a little longer, but in
the end it still died with an {{OutOfMemoryError}} after more data had been inserted. So it's
not that there are too many active writes, it's just that the node fails after a certain amount
of total writes, which indicates to me that a memory leak is indeed happening.

I also had another look into the heap dump I sent you, and you are correct that the heap is
mostly filled with {{BTree$Builder}} instances that still have stuff in their {{values}} array.
However, if you look closer, you will notice that for each of these instances, the {{values}}
array always contains {{null}} for the first couple of entries, and only after those there
is still actual content. For some reason, the actual content always starts at index 28, whereas
indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can also see that
for all the {{BTree$Builder}} objects, the {{count}} attribute is 0, which also indicates
to me that {{BTree$Builder.cleanup()}} has already run and those are not active writes. This
theory is supported by the fact that my little workaround of manually calling {{FastThreadLocal.removeAll()}}
actually works, because this means that no other objects except the {{FastThreadLocal}} s
still have references to the builders.

Therefore, I think we have two issues here:

# {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore accumulating references
to otherwise dead objects - maybe we can include something to at least remove non-static entries
regularly?
# {{BTree$Builder}} seems to have an issue properly cleaning up after building, so the objects
referenced by the {{FastThreadLocal}} s of the {{SEPWorker}} threads are very large and thus
ultimately lead to the {{OutOfMemoryError}} s

> FastThreadLocal leaks memory
> ----------------------------
>
>                 Key: CASSANDRA-13754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>            Reporter: Eric Evans
>            Assignee: Robert Stupp
>             Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, a heap analysis
is showing that more than 10G of our 12G heaps are consumed by the {{threadLocals}} members
(instances of {{java.lang.ThreadLocalMap}}) of various {{io.netty.util.concurrent.FastThreadLocalThread}}
instances.  Reverting [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message