hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-146) potential conflict in block id's, leading to data corruption
Date Tue, 18 Apr 2006 23:39:19 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-146?page=comments#action_12375011 ] 

Doug Cutting commented on HADOOP-146:

I'd vote for sequential allocation.  It will take a *really* long time to cycle through all
ids.  Migration should not be expensive, since it just requires renaming block files, not
copying them.  The high-watermark block id can be logged with the block->name table.

Here's one way to migrate: initially the high-water-mark id is zero.  So all blocks in the
name table are out-of-range, and hence need renaming.  Renaming can be handled like other
blockwork: the namenode can give datanodes rename commands.  While a block is being renamed
it must be kept in side tables, so that, e.g., requests to read files whose blocks are partially
renamed can still be handled.

> potential conflict in block id's, leading to data corruption
> ------------------------------------------------------------
>          Key: HADOOP-146
>          URL: http://issues.apache.org/jira/browse/HADOOP-146
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.1.0, 0.1.1
>     Reporter: Yoram Arnon
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3

> currently, block id's are generated randomly, and are not tested for collisions with
existing id's.
> while ids are 64 bits, given enough time and a large enough FS, collisions are expected.
> when a collision occurs, a random subset of blocks with that id will be removed as extra
replicas, and the contents of that portion of the containing file are one random version of
the block.
> to solve this one could check for id collision when creating a new block, getting a new
id in case of conflict. This approach requires the name node to keep track of all existing
block id's (rather than just the ones who have reported in), and to identify old versions
of a block id as in valid (in case a data node dies, a file is deleted, then a block id is
reused for a new file).
> Alternatively, one could simply use sequential block id's. Here the downsides are: 
> 1. migration from an existing file system is hard, requiring compaction of the entire
> 2. once you cycle through 64 bits of id's (quite a few years at full blast), you're in
trouble again (or run occasional/background compaction)
> 3. you must never lose the high watermark block id.
> synchronized Block allocateBlock(UTF8 src) {
>         Block b = new Block();
>         FileUnderConstruction v = (FileUnderConstruction) pendingCreates.get(src);
>         v.add(b);
>         pendingCreateBlocks.add(b);
>         return b;
>     }
> static Random r = new Random();
>     /**
>      */
>     public Block() {
>         this.blkid = r.nextLong();
>         this.len = 0;
>     }

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message