hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1269) DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock
Date Thu, 19 Apr 2007 17:11:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490113
] 

dhruba borthakur commented on HADOOP-1269:
------------------------------------------

I agree with Konstantin's suggestion that we should try to optimize as much as possible the
code in addStoredBlock and getAdditionalBlock before we try to optimize on locking behavour.
The conversions from UTF8 to/from String should be avoided and is a good thing to do. But
if you see the profiled output, it might not result in a big change to performance. Also,
changing Vectors to ArrayList is a good thing, but this is within the global FSNamesystem
lock, so there will never be a case when a thread will block on the lock of the Vector. Am
I missing something?

In response to Raghu's comments: the profiled load is DFSIO and it is writing to files. I
have profiled "sort" too and that eats loads of CPU in the FSNamesystem.open call. The clusterMap
is an ideal candidate for read/write locks because updates to it are rare but is used very
often.



> DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-1269
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1269
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: serverThreads1.html, serverThreads40.html
>
>
> I have been running a 2000 node cluster and measuring namenode performance. There are
quite a few "Calls dropped" messages in the namenode log. The namenode machine has 4 CPUs
and each CPU is about 30% busy. Profiling the namenode shows that the methods the consume
CPU the most are addStoredBlock() and getAdditionalBlock(). The first method in invoked when
a datanode confirms the presence of a newly created block. The second method in invoked when
a DFSClient request a new block for a file.
> I am attaching two files that were generated by the profiler. serverThreads40.html captures
the scenario when the namenode had 40 server handler threads. serverThreads1.html is with
1 server handler thread (with a max_queue_size of 4000).
> In the case when there are 40 handler threads, the total elapsed time taken by  FSNamesystem.getAdditionalBlock()
is 1957 seconds whereas the methods that that it invokes (chooseTarget) takes only about 97
seconds. FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock for all
those 1860 seconds.
> My proposal is to implement a finer grain locking model in the namenode. The FSNamesystem
has a few important data structures, e.g. blocksMap, datanodeMap, leases, neededReplication,
pendingCreates, heartbeats, etc. Many of these data structures already have their own lock.
My proposal is to have a lock for each one of these data structures. The individual lock will
protect the integrity of the contents of the data structure that it protects. The global FSNamesystem
lock is still needed to maintain consistency across different data structures.
> If we implement the above proposal, both addStoredBlock() and getAdditionalBlock() does
not need to hold the global FSNamesystem lock. startFile() and closeFile() still needs to
acquire the global FSNamesystem lock because it needs to ensure consistency across multiple
data structures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message