hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-210) Namenode not able to accept connections
Date Tue, 23 May 2006 20:46:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-210?page=comments#action_12412989 ] 

eric baldeschwieler commented on HADOOP-210:
--------------------------------------------

I think we need to break the one thread per connection model.  otherwise our servers will
not scale, so "selectors" are needed.

Also we probably need to break the very long connection caching model and the invarient of
one connection per VM.  Connection setup is nearly free and serializing requests from different
threads creates race conditions and other failure cases.


> Namenode not able to accept connections
> ---------------------------------------
>
>          Key: HADOOP-210
>          URL: http://issues.apache.org/jira/browse/HADOOP-210
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>  Environment: linux
>     Reporter: Mahadev konar
>     Assignee: Mahadev konar

>
> I am running owen's random writer on a 627 node cluster (writing 10GB/node).  After running
for a while (map 12% reduce 1%) I get the following error on the Namenode:
> Exception in thread "Server listener on port 60000" java.lang.OutOfMemoryError: unable
to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:574)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105)
> After this, the namenode does not seem to be accepting connections from any of the clients.
All the DFSClient calls get timeout. Here is a trace for one of them:
> java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:305)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149)
> 	at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406)
> 	at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78)
> 	at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46)
> 	at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
> 	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785).
> The namenode then has around 1% CPU utilization at this time (after the outofmemory exception
has been thrown). I have profiled the NameNode and it seems to be using around a maixmum heap
size of 57MB (which is not much). So, heap size does not seem to be a problem. It might be
happening due to lack of Stack space? Any pointers?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message