hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: Copy on write for HDFS
Date Mon, 16 Jul 2007 15:00:26 GMT

On Jul 15, 2007, at 11:08 PM, Dhruba Borthakur wrote:

> I guess what you are saying is that a block can belong to multiple  
> files.

A better name for the feature would be "clone," I think. And yes, it  
would be a file copy that is cheap since it doesn't involve moving  
any data. It only updates structures on the NameNode.

> 1. File deletion: In the current code, when a file is deleted, all  
> blocks
> belonging to that file are scheduled for deletion. This code has to  
> change
> in such a way that a block gets deleted only if it does not belong  
> to *any*
> file.

There would either need to be a ref count on the blocks or a reverse  
mapping of blocks to sets of files. And yes, you can only delete the  
block if the set of files is empty or the ref count goes to 0. A more  
invasive change is that the desired replication of the block is the  
maximum of the replications of the containing files. I assume that  
means that you would need to stored desired replication on each block  
rather than in the file information.

> 2. race between cow() and delete():  The client invokes cow() with   
> set of
> LocatedBlocks. Since there aren't any client side locks, by the  
> time the
> Namenode processes the cow() command, the original block(s) could  
> have been
> deleted.

The right interface in my opinion is not that you give blocks at all,  
but do the clone at the file level.

void cloneFile(Path source, Path destination) throws IOException

or something. Then the namespace can be locked while the data  
structures are read and modified.

-- Owen

View raw message