hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Mittal <amitmitt...@gmail.com>
Subject Processing steps of NameNode & Secondary NameNode
Date Mon, 27 Jan 2014 12:12:06 GMT

I have a doubt of the processing steps of NameNode:

*Reference:* "Hadoop: The Definitive Guide:3rd Ed" book by "Tom White"
On page# 340 (Ch 10: HDFS > The file system image & edit log)

Text from book:
When a filesystem client performs a write operation (such as creating or
moving a file), it is first recorded in the edit log. The namenode also has
an in-memory representation of the filesystem metadata, which it updates
after the edit log has been modified. The in-memory metadata is used to
serve read requests.
The edit log is flushed and *synced *after every write before a success
code is returned to the client. For namenodes that write to multiple
directories, the write must be flushed and synced to every copy before
returning successfully. This ensures that no operation is lost due to
machine failure.
*Question 1: *The in-memory representation is updated before/after
returning to the client or it is done async while updating the status code
to client? I believe it should be before the status is sent to client.
*Question 2: *What does "synced after every write" means here? For one
file, there is only one writer. So when there is any write operation to the
file, it is recorded in the edit log and flushed, no other writer will be
working for this file. However there might be other writers working on
other files and for any operation to that, edit log will be updated. Now
there will multiple copies of edit log which will be merged. Is this
understanding correct ?
*Question 3:* Sorry, I did not get "For namenodes that* write to multiple
directories*, the write must be flushed and synced to *every copy* before
returning successfully." ? Especially the text in bold.

Amit Mittal

View raw message