hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-12922) Arrays of length 1 cause 9.2% memory overhead
Date Wed, 13 Dec 2017 02:32:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Misha Dmitriev updated HDFS-12922:
    Attachment: screenshot-1.png

> Arrays of length 1 cause 9.2% memory overhead
> ---------------------------------------------
>                 Key: HDFS-12922
>                 URL: https://issues.apache.org/jira/browse/HDFS-12922
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: screenshot-1.png
> I recently obtained a big (over 60GiB) heap dump from a customer and analyzed it using
jxray (www.jxray.com). One source of memory waste that the tool detected is arrays of length
1 that come from {{BlockInfo[] org.apache.hadoop.hdfs.server.namenode.INodeFile.blocks}} and
{{INode$Feature[] org.apache.hadoop.hdfs.server.namenode.INodeFile.features}}. Only a small
fraction of these arrays (less than 10%) have a length greater than 1. Collectively these
arrays waste 5.5GiB, or 9.2% of the heap. See the attached screenshot for more details.
> The reason why an array of length 1 is problematic is that every array in the JVM has
a header, that takes between 16 and 20 bytes depending on the JVM configuration. For a big
enough array this 16-20 byte overhead is not a concern, but if the array has only one element
(that takes 4-8 bytes depending on the JVM configuration), the overhead becomes bigger than
the array's "workload".
> In such a situation it makes sense to replace the array data field {{Foo[] ar}} with
an {{Object obj}}, that would contain either a direct reference to the array's single workload
element, or a reference to the array if there is more than one element. This change will require
further code changes and type casts. For example, code like {{return ar[i];}} becomes {{return
(obj instanceof Foo) ? (Foo) obj : ((Foo[]) obj)[i];}} and so on. This doesn't look very pretty,
but as far as I see, the code that deals with e.g. INodeFile.blocks already contains various
null checks, etc. So we will not make the code much less readable.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message