hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-12922) Arrays of length 1 cause 9.2% memory overhead
Date Wed, 13 Dec 2017 02:32:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misha Dmitriev updated HDFS-12922:
----------------------------------
    Attachment: screenshot-1.png

> Arrays of length 1 cause 9.2% memory overhead
> ---------------------------------------------
>
>                 Key: HDFS-12922
>                 URL: https://issues.apache.org/jira/browse/HDFS-12922
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: screenshot-1.png
>
>
> I recently obtained a big (over 60GiB) heap dump from a customer and analyzed it using
jxray (www.jxray.com). One source of memory waste that the tool detected is arrays of length
1 that come from {{BlockInfo[] org.apache.hadoop.hdfs.server.namenode.INodeFile.blocks}} and
{{INode$Feature[] org.apache.hadoop.hdfs.server.namenode.INodeFile.features}}. Only a small
fraction of these arrays (less than 10%) have a length greater than 1. Collectively these
arrays waste 5.5GiB, or 9.2% of the heap. See the attached screenshot for more details.
> The reason why an array of length 1 is problematic is that every array in the JVM has
a header, that takes between 16 and 20 bytes depending on the JVM configuration. For a big
enough array this 16-20 byte overhead is not a concern, but if the array has only one element
(that takes 4-8 bytes depending on the JVM configuration), the overhead becomes bigger than
the array's "workload".
> In such a situation it makes sense to replace the array data field {{Foo[] ar}} with
an {{Object obj}}, that would contain either a direct reference to the array's single workload
element, or a reference to the array if there is more than one element. This change will require
further code changes and type casts. For example, code like {{return ar[i];}} becomes {{return
(obj instanceof Foo) ? (Foo) obj : ((Foo[]) obj)[i];}} and so on. This doesn't look very pretty,
but as far as I see, the code that deals with e.g. INodeFile.blocks already contains various
null checks, etc. So we will not make the code much less readable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message