hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaobing Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11047) Remove deep copies of FinalizedReplica to alleviate heap consumption on DataNode
Date Mon, 24 Oct 2016 18:27:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiaobing Zhou updated HDFS-11047:
---------------------------------
    Description: 
DirectoryScanner does scan by deep copying FinalizedReplica. In a deployment with 500,000+
blocks, we've seen the DN heap usage being accumulated to high peaks. Deep copies of FinalizedReplica
will make DN heap usage even worse if directory scans are scheduled more frequently. This
proposes removing unnecessary deep copies since DirectoryScanner#scan already holds lock of
dataset. 

DirectoryScanner#scan
{code}
    try(AutoCloseableLock lock = dataset.acquireDatasetLock()) {
      for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
        String bpid = entry.getKey();
        ScanInfo[] blockpoolReport = entry.getValue();
        
        Stats statsRecord = new Stats(bpid);
        stats.put(bpid, statsRecord);
        LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
        diffs.put(bpid, diffRecord);
        
        statsRecord.totalBlocks = blockpoolReport.length;
        List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid); /* deep copies here*/
{code}

FsDatasetImpl#getFinalizedBlocks
{code}
  public List<ReplicaInfo> getFinalizedBlocks(String bpid) {
    try (AutoCloseableLock lock = datasetLock.acquire()) {
      ArrayList<ReplicaInfo> finalized =
          new ArrayList<ReplicaInfo>(volumeMap.size(bpid));
      for (ReplicaInfo b : volumeMap.replicas(bpid)) {
        if (b.getState() == ReplicaState.FINALIZED) {
          finalized.add(new ReplicaBuilder(ReplicaState.FINALIZED)
              .from(b).build()); /* deep copies here*/
        }
      }
      return finalized;
    }
  }
{code}

  was:
DirectoryScanner does scan by deep copying FinalizedReplica. In a deployment with 500,000+
blocks, we've seen the DN heap usage being accumulated to high peaks. Deep copies of FinalizedReplica
will make DN heap usage even worse if directory scans are scheduled more frequently. This
proposes removing the deep copies since DirectoryScanner#scan already holds lock of dataset.


DirectoryScanner#scan
{code}
    try(AutoCloseableLock lock = dataset.acquireDatasetLock()) {
      for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
        String bpid = entry.getKey();
        ScanInfo[] blockpoolReport = entry.getValue();
        
        Stats statsRecord = new Stats(bpid);
        stats.put(bpid, statsRecord);
        LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
        diffs.put(bpid, diffRecord);
        
        statsRecord.totalBlocks = blockpoolReport.length;
        List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid); /* deep copies here*/
{code}

FsDatasetImpl#getFinalizedBlocks
{code}
  public List<ReplicaInfo> getFinalizedBlocks(String bpid) {
    try (AutoCloseableLock lock = datasetLock.acquire()) {
      ArrayList<ReplicaInfo> finalized =
          new ArrayList<ReplicaInfo>(volumeMap.size(bpid));
      for (ReplicaInfo b : volumeMap.replicas(bpid)) {
        if (b.getState() == ReplicaState.FINALIZED) {
          finalized.add(new ReplicaBuilder(ReplicaState.FINALIZED)
              .from(b).build()); /* deep copies here*/
        }
      }
      return finalized;
    }
  }
{code}


> Remove deep copies of FinalizedReplica to alleviate heap consumption on DataNode
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-11047
>                 URL: https://issues.apache.org/jira/browse/HDFS-11047
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, fs
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>
> DirectoryScanner does scan by deep copying FinalizedReplica. In a deployment with 500,000+
blocks, we've seen the DN heap usage being accumulated to high peaks. Deep copies of FinalizedReplica
will make DN heap usage even worse if directory scans are scheduled more frequently. This
proposes removing unnecessary deep copies since DirectoryScanner#scan already holds lock of
dataset. 
> DirectoryScanner#scan
> {code}
>     try(AutoCloseableLock lock = dataset.acquireDatasetLock()) {
>       for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
>         String bpid = entry.getKey();
>         ScanInfo[] blockpoolReport = entry.getValue();
>         
>         Stats statsRecord = new Stats(bpid);
>         stats.put(bpid, statsRecord);
>         LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
>         diffs.put(bpid, diffRecord);
>         
>         statsRecord.totalBlocks = blockpoolReport.length;
>         List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid); /* deep copies
here*/
> {code}
> FsDatasetImpl#getFinalizedBlocks
> {code}
>   public List<ReplicaInfo> getFinalizedBlocks(String bpid) {
>     try (AutoCloseableLock lock = datasetLock.acquire()) {
>       ArrayList<ReplicaInfo> finalized =
>           new ArrayList<ReplicaInfo>(volumeMap.size(bpid));
>       for (ReplicaInfo b : volumeMap.replicas(bpid)) {
>         if (b.getState() == ReplicaState.FINALIZED) {
>           finalized.add(new ReplicaBuilder(ReplicaState.FINALIZED)
>               .from(b).build()); /* deep copies here*/
>         }
>       }
>       return finalized;
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message