hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14688) Intern strings in KeyVersion and EncryptedKeyVersion
Date Thu, 27 Jul 2017 17:55:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103609#comment-16103609

Xiao Chen commented on HADOOP-14688:

The heapdumps are too big to attach here, so I uploaded a screenshot of the most relevant
analysis result out of it.

The 2 most duplicated strings (mG... and 0O...) are the 2 key version names. I was running
re-encryption on a zone with 1M files. 2 different key versions were among those files in
this run.

Verified after interning, this goes away.

[~daryn], do you think this makes sense? Thanks!

> Intern strings in KeyVersion and EncryptedKeyVersion
> ----------------------------------------------------
>                 Key: HADOOP-14688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14688
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HADOOP-14688.01.patch, heapdump analysis.png
> This is inspired by [~misha@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of {{KeyVersion}}
and {{EncryptedKeyVersion}}. We should not create duplicate objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' EDEKs in
a given EZ. Those EDEKs all has the same key name, and mostly using no more than a couple
of key version names.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message