hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1306) DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode
Date Thu, 17 May 2007 01:56:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496451

Konstantin Shvachko commented on HADOOP-1306:

# Indentation in BlocksMap is wrong in most cases.
# noClone is not used in the constructor. And the constructor itself is not used anywhere.
NodeIterator(DatanodeDescriptor[] nodes, boolean noClone) {
  arr = nodes;
# I do not understand how BlocksMap.nodeIterator() work. The entries are not protected by
the lock. Even though you clone them what happens if the original array is modified by
another thread?
# I think removal of version consistency on the DataNode was not intended for this patch.
# Three unused imports in DatanodeDescriptor (only one of which is new though).
# ConcurrentSkipListMap is a Java 6 class. Have we officially switched to Java 6?
# I praise introduction of PendingCreates that merges everything pending into one class.
# An example of things I do not understand from PendingCreates.java
boolean removeBlock(UTF8 file, Block b) { }}
  FileUnderConstruction v =  pendingCreates.get(file);
  if (v != null) {
    synchronized (v) {
    ......removing block here........
  So what happens if another thread completes or deletes pending file v after obtaining v
but before entering synchronized section on v?
  Even if it works fine in this case I don't think this is the right direction, because supporting
that kind of tricks will be hard.

> DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode
> -----------------------------------------------------------------------------
>                 Key: HADOOP-1306
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1306
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: fineGrainLocks3.patch
> One of the most-frequently-invoked RPCs in the namenode is the addBlock() RPC. The DFSClient
uses this RPC to allocate one more block for a file that it is currently operating upon. The
scalability of the namenode will improve if we can decrease the number of addBlock() RPCs.
One idea that we want to discuss here is to make addBlock() return more than one block. This
proposal came out of a discussion I had with Ben Reed. 
> Let's say that addBlock() returns n blocks for the file. The namenode already tracks
these blocks using the pendingCreates data structure. The client guarantees that these n blocks
will be used in order. The client also guarantees that if it cannot use a block (dues to whatever
reason), it will inform the namenode using the abandonBlock() RPC. These RPCs are already
> Another possible optimization : since the namenode has to allocate n blocks for a file,
should it use the same set of datanodes for this set of blocks? My proposal is that if n is
a small number (e.g. 3), it is prudent to allocate the same set of datanodes to host all replicas
for this set of blocks. This will reduce the CPU spent in chooseTargets().

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message