hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Azuryy(Chijiong) (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3323) Distributed Cache for Map or Reduce or Both
Date Tue, 01 Nov 2011 12:56:32 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Azuryy(Chijiong) updated MAPREDUCE-3323:
----------------------------------------

    Release Note: 
Tested as follow: 

1: Add cache file for map/reduce; 
2: get cache files in the configure of the map and reduce, 
    then print some messages if map/reduce can get cache file or not.

3: Three test cases: cache for mapred, cache for map, cache for reduce
    For the first case, both map and reduce can get local files from the distributed cache.
    The second case, Map Task can get local files from the distributed cache, but reduce can
not. 
    You know what happened during the third case.

conclusion:  It does work well.
 

  was:
Tested as follow: 

1: Add cache file for map; 
2: get cache files in the configure of the map and reduce, 
    then print some messages if map/reduce can get cache file or not.

3: Three test cases: cache for mapred, cache for map, cache for reduce
    For the first case, both map and reduce can get local files from the distributed cache.
    The second case, Map Task can get local files from the distributed cache, but reduce can
not. 
    You know what happened during the third case.

conclusion:  It does work well.
 

    
> Distributed Cache for Map or Reduce or Both
> -------------------------------------------
>
>                 Key: MAPREDUCE-3323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache, tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Azuryy(Chijiong)
>         Attachments: DistributedCache.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce use thses
cached files, not useful for both. but TaskTracker always download cached files from HDFS,
if there are some little bit big files in cache, it's time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. but they are specified for only map or reduce,
not both of them.
> But if you do need cache file during both map and reduce, then use original interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message