hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3051) DataXceiver: java.io.IOException: Too many open files
Date Wed, 19 Mar 2008 23:04:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580603#action_12580603
] 

Raghu Angadi commented on HADOOP-3051:
--------------------------------------

> fd limit is 1024 

Seems quite low for a systems applications that are io intensive like this. 1024 probably
made sense years ago. 

Even with 16, you are close to the limit. Assuming the replication is 3, at any time the datanodes
are writing 3 * 2000 blocks => each datanode is writing 750 blocks. With uniform distribution,
each block write takes 2.66 fds => each datanode needs 2000 fds.

Irrespective the limit, looks like most users may not want 'write timeout'. May be by default
HDFS should not make DataNode take more fds  than before (may be DFSClient too).

> DataXceiver: java.io.IOException: Too many open files
> -----------------------------------------------------
>
>                 Key: HADOOP-3051
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3051
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: André Martin
>
> I just ran an experiment with the latest nightly build hadoop-2008-03-15 available and
after 2 minutes I'm getting a tons of "java.io.IOException: Too many open files" exceptions
as shown here:
> {noformat} 2008-03-19 20:08:09,303 ERROR org.apache.hadoop.dfs.DataNode: 
> 141.30.xxx.xxx:50010:DataXceiver: java.io.IOException: Too many open files
>      at sun.nio.ch.IOUtil.initPipe(Native Method)
>      at sun.nio.ch.EPollSelectorImpl.<init>(Unknown Source)
>      at sun.nio.ch.EPollSelectorProvider.openSelector(Unknown Source)
>      at sun.nio.ch.Util.getTemporarySelector(Unknown Source)
>      at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
>      at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1114)
>      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
>      at java.lang.Thread.run(Unknown Source){noformat}
> I ran the same experiment with same high workload (50 dfs clients with 40 streams each
writing concurrently files on a 8 nodes DFS cluster) with the 0.16.1 release and no exception
is thrown. So it looks like a bug to me...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message