hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3633) Uncaught exception in DataXceiveServer
Date Thu, 03 Jul 2008 01:12:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610100#action_12610100
] 

Konstantin Shvachko commented on HADOOP-3633:
---------------------------------------------

> The question is not if we should handle these cases, but rather how.

The first thing is that we should handle limitations. Current code does not, the patch does.
I am glad we agree on that.
On "how", as I said before 256 comes from practical observations. I have seen cases when nodes
were struggling to handle
more than that, and I'd rather be conservative here than leaving the problem unsolved by setting
the limit too high.

> this will lead to problems and will limit Hadoop functionality.

On the contrary, currently the functionality of hadoop is bounded by the lack of thread limitation
because nodes become dysfunctional.
Introducing the limit will make it functional again.
The 256 limit does not look low if you look at it from the point of view of how many clients
can simultaneously do transfers.
On a 2000 node cluster it is about 500,000 of them. It is pretty big even if you divide it
by the replication factor of 3 for writes.

Although I agree it would be better to have a method of calculating the limit based on some
natural criteria like hardware 
configuration or heap size. I would be glad to hear ideas in this direction.

> Uncaught exception in DataXceiveServer
> --------------------------------------
>
>                 Key: HADOOP-3633
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3633
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>         Environment: 17.0 + H1979-H2159-H3442
>            Reporter: Koji Noguchi
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: DataXceivr.patch, jstack-H3633.txt
>
>
> Observed dfsclients timing out to some datanodes.
> Datanode's  '.out' file had 
> {noformat}
> Exception in thread "org.apache.hadoop.dfs.DataNode$DataXceiveServer@82d37" java.lang.OutOfMemoryError:
unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:597)
>   at org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:906)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Datanode was still running but not much activity besides verification.
> Jstack showed no DataXceiveServer running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message