hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4780) Task Tracker burns a lot of cpu in calling getLocalCache
Date Mon, 08 Dec 2008 07:48:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654342#action_12654342
] 

Joydeep Sen Sarma commented on HADOOP-4780:
-------------------------------------------

ok - i think i was right. the problem with the current implementation is that it follows symlinks.
here's the proof. on a machine exhibiting this problem i ran two versions of getDU - one which
ignores symlinks and one which doesn't:

// ignoring symlinks:
[root@hadoop5283.snc1 /mnt/d0/mapred/local/taskTracker]# time /mnt/vol/hive/stable/cluster/bin/hadoop
jar <my.jar> <myClass> ./jobcache
0

real    0m2.756s
user    0m0.890s
sys     0m1.615s

// not ignoring symlinks - using FileUtil.getDU()
[root@hadoop5283.snc1 /mnt/d0/mapred/local/taskTracker]# time /mnt/vol/hive/stable/cluster/bin/hadoop
jar <my.jar> <FileUtilClass>  ./jobcache

real    0m34.760s
user    0m1.895s
sys     0m20.671s

note that i hit ^C in the second call - the call had just hung (just like our tasks do)

So all we have to do is detect symlinks and not follow them. I used standard technique mentioned
here: http://www.idiom.com/~zilla/Xfiles/javasymlinks.html



> Task Tracker  burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
>                 Key: HADOOP-4780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4780
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Runping Qi
>         Attachments: 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was  system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
>   java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>         at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>         at java.io.File.exists(File.java:733)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message