hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-158) dfs should allocate a random blockid range to a file, then assign ids sequentially to blocks in the file
Date Tue, 30 May 2006 22:23:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-158?page=comments#action_12413922 ] 

Sameer Paranjpye commented on HADOOP-158:

Yes, random assignment of file-ids makes collisions more likely. However, collisions are possible
even with sequential assignment, and if they are possible they need to be detected. Since,
collision detection code is needed with both random and sequential assignment, random assignment
makes the system simpler because the namenode doesn't have to track the 'high watermark' file-id.

Don't think recently assigned file-ids that belong to incomplete files are a concern, since
the namenode will be aware of all file-ids used, whether they belong to incomplete files or

Wrap around before a file completes is not the only collision scenario. In the sequential
assignment scheme, suppose, the first million files in the system get the file-ids, 0-999999.
These files archival data of some kind, so are never deleted. Life goes on, lots of files
are created and removed, at any given time there are only a few million files total (complete
+ incomplete) in the system. At some point, the system will have gone through a trillion file
creation events, the file-ids will wrap and start to collide with the first million files.

> dfs should allocate a random blockid range to a file, then assign ids sequentially to
blocks in the file
> --------------------------------------------------------------------------------------------------------
>          Key: HADOOP-158
>          URL: http://issues.apache.org/jira/browse/HADOOP-158
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.4

> A random number generator is used to allocate block ids in dfs.  Sometimes a block id
is allocated that is already used in the filesystem, which causes filesystem corruption.
> A short-term fix for this is to simply check when allocating block ids whether any file
is already using the newly allocated id, and, if it is, generate another one.  There can still
be collisions in some rare conditions, but these are harder to fix and will wait, since this
simple fix will handle the vast majority of collisions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message