hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haohui Mai <h...@hortonworks.com>
Subject Re: Processing steps of NameNode & Secondary NameNode
Date Mon, 27 Jan 2014 19:20:57 GMT
Conceptually you can think of the namenode is similar to a journal file
system. For each write, it updates the in-memory data structure, persists
the operations on the stable storage (i.e., calling sync to flush the
buffer of the edit logs), then responds to the client.

Note that all writes are serialized, which means the writes are given a
total order. There are no consistent issues between multiple clients.

For question 3, the NN can writes to multiple edit logs  with the same
content at the same time. This allows the operator to store a copy of edit
logs in NFS. In this case NN calls sync() for each edit log.


On Mon, Jan 27, 2014 at 4:12 AM, Amit Mittal <amitmittal5@gmail.com> wrote:

> Hi,
> I have a doubt of the processing steps of NameNode:
> *Reference:* "Hadoop: The Definitive Guide:3rd Ed" book by "Tom White"
> On page# 340 (Ch 10: HDFS > The file system image & edit log)
> Text from book:
> ....
> When a filesystem client performs a write operation (such as creating or
> moving a file), it is first recorded in the edit log. The namenode also has
> an in-memory representation of the filesystem metadata, which it updates
> after the edit log has been modified. The in-memory metadata is used to
> serve read requests.
> The edit log is flushed and *synced *after every write before a success
> code is returned to the client. For namenodes that write to multiple
> directories, the write must be flushed and synced to every copy before
> returning successfully. This ensures that no operation is lost due to
> machine failure.
> ...
> *Question 1: *The in-memory representation is updated before/after
> returning to the client or it is done async while updating the status code
> to client? I believe it should be before the status is sent to client.
> *Question 2: *What does "synced after every write" means here? For one
> file, there is only one writer. So when there is any write operation to the
> file, it is recorded in the edit log and flushed, no other writer will be
> working for this file. However there might be other writers working on
> other files and for any operation to that, edit log will be updated. Now
> there will multiple copies of edit log which will be merged. Is this
> understanding correct ?
> *Question 3:* Sorry, I did not get "For namenodes that* write to multiple
> directories*, the write must be flushed and synced to *every copy* before
> returning successfully." ? Especially the text in bold.
> Thanks
> Amit Mittal

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message