hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "he yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4780) Task Tracker burns a lot of cpu in calling getLocalCache
Date Wed, 10 Dec 2008 14:40:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

he yongqiang updated HADOOP-4780:
---------------------------------

    Attachment: 4780-2.patch

@Zheng, I checked the code again, and tested it on a cluster.Yes, you are right.
I think we have two options here, 
1)we can record each cache's size in its cacheStatus object, and when deleting, iterate over
all cacheStatus objects and find the total size of a baseDir.  this is easier, but much time-consuming
since we have to iterate over all cacheStatus objects whenever we do the localizeCache.  
2)we recode the size for each baseDir, size of each cacheStatus Object, and also which baseDir
this cache lies in. When a cache archive/cache file is stored in or removed from this baseDir,
update the corresponding baseDir record. 

The first one is simple but time-comsuming, and it only needs to introduce one field in CacheStatus
class. The second one needs to introduce a map for recording size for each baseDir, an int
field in CacheStatus for recording its size, and a string field for recording which baseDir
this cache lies in.

Submitted another patch based on option 2

> Task Tracker  burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
>                 Key: HADOOP-4780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4780
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Runping Qi
>         Attachments: 4780-2.patch, 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was  system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
>   java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>         at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>         at java.io.File.exists(File.java:733)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message