hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8792) Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks
Date Thu, 30 Jul 2015 08:43:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647349#comment-14647349

Yi Liu commented on HDFS-8792:

[~arpitagarwal], sorry for late response

do you have any estimates of the memory saved by using LightWeightHashSet?
Yes, compared to java {{HashSet}}, there are two advantages from memory point of review:
# Java {{HashSet}} internally uses a {{HashMap}}, so there is one more reference (4 bytes)
for each entry compared to {{LightWeightHashSet}}, so we can save {{4 * size}} bytes of memory.
# In {{LightWeightHashSet}}, when elements become less, the size is shrinked a lot.

So we can see {{LightWeightHashSet}} is more better.  The main issue is {{LightWeightHashSet#LinkedSetIterator}}
doesn't support {{remove}} currently, it's easy to support it (similar to java HashSet). 
  By the way, currently in Hadoop, we use {{LightWeightHashSet}} for all big objects required
hash set except this one which needs to use {{remove}}.

> Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks
> --------------------------------------------------------------------
>                 Key: HDFS-8792
>                 URL: https://issues.apache.org/jira/browse/HDFS-8792
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-8792.001.patch
> {{LightWeightHashSet}} requires fewer memory than java hashset. 
> Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of {{TreeMap}}
instead, since no need to sort. 

This message was sent by Atlassian JIRA

View raw message