hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2379) 0.20: Allow block reports to proceed without holding FSDataset lock
Date Wed, 28 Sep 2011 09:10:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116304#comment-13116304
] 

Todd Lipcon commented on HDFS-2379:
-----------------------------------

As discussed in the above-referenced JIRA, I think we can do something like the following
pseudocode:

{code}
Set<Block> blocksFoundByScan = inconsistentScanVolume(); // ignore any file-not-founds
we get due to concurrent FS modifications
synchronized (volume) {
  Set<Block> missingFromScan = Sets.difference(volumeMap.keySet(), blocksFoundByScan);
  Set<Block> missingFromMem = Sets.difference(blocksFoundByScan, volumeMap.keySet());
  for (Block b : missingFromScan) { // block is in memory but not in scan
    if (b exists on disk) {
      // it got added after we scanned that part of the tree!
      add it to block report
    }
  }
  for (Block b : missingFromMem) { // block was on disk but not in memory
    if (b no longer exists on disk) {
       // remove from block report - it was deleted after we scanned that part
    }
  }
}
{code}

Anyone see a reason why this wouldn't work? Basically the idea is to do a "rough sketch" scan
first, then anywhere we detect inconsistency, we touch it up, while holding the lock.
                
> 0.20: Allow block reports to proceed without holding FSDataset lock
> -------------------------------------------------------------------
>
>                 Key: HDFS-2379
>                 URL: https://issues.apache.org/jira/browse/HDFS-2379
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.206.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> As disks are getting larger and more plentiful, we're seeing DNs with multiple millions
of blocks on a single machine. When page cache space is tight, block reports can take multiple
minutes to generate. Currently, during the scanning of the data directories to generate a
report, the FSVolumeSet lock is held. This causes writes and reads to block, timeout, etc,
causing big problems especially for clients like HBase.
> This JIRA is to explore some of the ideas originally discussed in HADOOP-4584 for the
0.20.20x series.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message