hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Krokhmalyov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-652) Not all Datastructures are updated when a block is deleted
Date Fri, 10 Nov 2006 12:18:38 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-652?page=comments#action_12448720 ] 
            
Vladimir Krokhmalyov commented on HADOOP-652:
---------------------------------------------

I create and delete a lot of files in DFS, so I see that speed of DataNode extremely go down!

> a) currently subdirectories are created only in the last sub directory (e.g. subdir63).

Not only subdir63. After the DataNode restarts, subdirectories will be created in other subdirXY,
because "File[] files = dir.listFiles();" in FSDir constructor lists subdirectories and files
in arbitrary order and last subdirectory will be other. Two branches: subdir63 and other one.
It is not bug. Other code process this type of tree properly.

> b) remove siblings array I think it only increase s recursion in addBlock().

Recursion is not good idea, because it very slow when DataNode stores a lot of blocks. I think
this algorithm should be changed in the future.



Here is my tested solution of this bug:

New method clearPath() in FSDir:

>        void clearPath( File f ) {
>         if ( dir.compareTo( f ) == 0 ) numBlocks--;
>         else {
>          if ( ( siblings != null ) && ( myIdx != ( siblings.length - 1 ) ) )
>           siblings[ myIdx + 1 ].clearPath( f );
>          else if ( children != null )
>           children[ 0 ].clearPath( f );
>         }
>        }

New method clearPath() in FSVolume:

>      void clearPath( File f ) {
>          dataDir.clearPath( f );
>      }

Changes in invalidate() method in FSDataset:

<        blockMap.remove(invalidBlks[i]);

>        synchronized ( ongoingCreates ) {
>        blockMap.remove( invalidBlks[ i ] );
>             FSVolume v = volumeMap.get( invalidBlks[ i ] );
>        volumeMap.remove( invalidBlks[ i ] );
>        v.clearPath( f.getParentFile() );
>       }

And changes in getFile() method in FSDataset:

<     return blockMap.get(b);

>     synchronized ( ongoingCreates ) {
>      return blockMap.get( b );
>     }

Now I will try to create patch file properly.

P.S. Also I set dfs.blockreport.intervalMsec = 10000 ( 10 - 30 sec ) in order to prevent lowering
NameNode's speed. Because NameNode holds deleted blocks in its datastructures across block
reports.


> Not all Datastructures are updated when a block is deleted
> ----------------------------------------------------------
>
>                 Key: HADOOP-652
>                 URL: http://issues.apache.org/jira/browse/HADOOP-652
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Raghu Angadi
>
> Currently when a block is deleted, DataNode just deletes the physical file and updates
its map. We need to update more things. For e.g. numBlocks in FSDir is not decremented.. effect
of this would be that we will create more subdirectories than necessary. It might not show
up badly yet since numBlocks gets correct value when the dataNode restarts. I have to see
what else needs to be updated.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message