cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13063) Too many instances of BigVersion
Date Wed, 21 Dec 2016 18:39:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767796#comment-15767796
] 

Simon Zhou commented on CASSANDRA-13063:
----------------------------------------

I realized this issue when debugging CASSANDRA-13049. Having 70+ objects is not an issue.
However, we had over 1 million open files during bootstrapping in CASSANDRA-13049 and almost
all of them are *-data.db and *-index.db. Given that each sstable uses one Descriptor instance,
which holds a separate instance of BigVersion, the memory footprint is not trivial. For a
quick proof of concept, I tested with below code snippet within BigFormat.java:

{code}
    public static void main(String[] args) throws InterruptedException
    {
        List<BigVersion> versions = new ArrayList<>();
        // Create half million objects to simulate my issue in CASSANDRA-13049.
        for (int i = 0; i < 500000; i++) {
            versions.add(new BigVersion("3.0.10"));
        }
        Thread.sleep(100000000);
    }
{code}

By using JConsole, I can see the heap usage stays around 32MB AFTER PERFORMING GC. Then if
I remove the "for" loop, the heap usage AFTER PERFORMING GC is ~9MB. That means, the BigVersion
objects still contribute to over 20MB memory.

[~aleksey.kasyanov], as mentioned by [~slebresne], I'm not going to make it enum, just using
something like below. Does it make sense to you?

{code}
    private static final ConcurrentHashMap<String, Version> versions = new ConcurrentHashMap<>();
    @Override
    public Version getVersion(String version)
    {
        assert version != null : "Version cannot be null";

        Version bigVersion = versions.get(version);
        if (bigVersion == null) {
            bigVersion = new BigVersion(version);
            versions.putIfAbsent(version, bigVersion);
        }

        return versions.get(version);
    }
{code}

> Too many instances of BigVersion
> --------------------------------
>
>                 Key: CASSANDRA-13063
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13063
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>            Priority: Minor
>
> When debugging with Cassandra 3.0.10 I found 70+ BigVersion objects on a new node after
. This was from a cluster created by CMM and had very little data. Since we create a new instance
of BigVersion for each SSTable, that would create too many objects, eg, when bootstrapping
new node in a cluster with many sstables.
> Looks like sstables can actually share the same BigVersion instance as long as they has
same version. What we can do is to create a object cache and only create new object if not
found.
> {code}
> ConcurrentHashMap<String, BigVersion> versions = new ConcurrentHashMap<>();
> {code}
> May not be a big deal but a minor improvement.  [~tjake] what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message