Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=from:to:subject:date:message-id:mime-version:content-type:
	content-transfer-encoding:x-mailer:in-reply-to:thread-index:x-mimeole;
	b=1NNGiXqqTbK6g9KD9YFP1stAx5X7Gyx5/SYsnBzFHW7qnIBfguu8jbgnhqYyUJET
From: "Dhruba Borthakur" <dhruba@yahoo-inc.com>
To: <hadoop-dev@lucene.apache.org>
Subject: RE: [jira] Commented: (HADOOP-334) Redesign the dfs namespace
 datastructures to be copy on write
Date: Mon, 6 Nov 2006 14:24:53 -0800
Message-ID: <016f01c701f2$61fc0c10$639115ac@ds.corp.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
In-Reply-To: <28796411.1162849058917.JavaMail.jira@brutus>
Thread-Index: AccB69z/BlwjgWSlTquSuuR17QkPCAABPDBg

Regarding copy-on-write approach, we do not need to traverse the entire
namespace to reset the clone pointers at the end of the checkpointing
process. We can keep a lookaside list that contains all the nodes that have
a clone pointer. But we still have to acquire the global lock at the end of
the checkpointing process, traverse this lookaside list of cloned-nodes, and
then null-them.

I like the generalized scheme of fine-grain locks (instead of a global lock)
while traversing the namespace. It is more efficient once implemented
correctly. There are quite a few tricks about lock-hierarchy that one has to
play for "renames". But it can be done.

The one thing that I am not clear about is whether we get correct semantics
if the imagefile and the editfile overlap.  If x, y and z are three
transactions, are you saying that
   			x + y + z is equilvalent to x + y + y +z  
where y is a single transaction that resides in the image file as well as
the edits file. Are you proposing something like a global transaction number
to identify duplicate transactions?

-----Original Message-----
From: Sameer Paranjpye (JIRA) [mailto:jira@apache.org] 
Sent: Monday, November 06, 2006 1:38 PM
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-334) Redesign the dfs namespace
datastructures to be copy on write

    [
http://issues.apache.org/jira/browse/HADOOP-334?page=comments#action_1244754
2 ] 
            
Sameer Paranjpye commented on HADOOP-334:
-----------------------------------------

Copy on write helps, but the global lock needs to be acquired at the end of
the checkpointing process nevertheless. This still has the effect of locking
clients out of the namespace while the entire namespace is traversed and the
clone pointers are reset.


Instead of copy on write, how about changing the locking model so that for
any change:
1. Acquire read locks on all structures between the root and the change,
acquire a write lock on the changed
node.
2. To checkpoint, traverse the namespace acquiring read locks on the path
between the root and the node being checkpointed. Serialize each node to a
new image file on disk.

This way we never lock down the whole tree, for any operation. At the start
of the checkpointing process, a new edits file is created. Edits that occur
while the checkpoint is in progress are sent to the new file. This implies
that there will be some overlap between the checkpointed image and the edits
file, but this is ok. We require that the union of the image and the edits
give us the current state of the namespace but the two do not have to be
disjoint. 

> Redesign the dfs namespace datastructures to be copy on write
> -------------------------------------------------------------
>
>                 Key: HADOOP-334
>                 URL: http://issues.apache.org/jira/browse/HADOOP-334
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.4.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>
> The namespace datastructures should be copy on write so that the namespace
does not need to be completely locked down from user changes while the
checkpoint is being made.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira