hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.
Date Sat, 05 Nov 2011 02:45:51 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144527#comment-13144527

Hadoop QA commented on MAPREDUCE-3323:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1254//console

This message is automatically generated.
> Add new interface for Distributed Cache, which special  for Map or Reduce,but not Both.
> ---------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache, tasktracker
>    Affects Versions:
>            Reporter: Azuryy(Chijiong)
>             Fix For:
>         Attachments: DistributedCache.patch, GenericOptionsParser.patch, JobClient.patch,
TaskDistributedCacheManager.patch, TaskTracker.patch
> We put some file into Distributed Cache, but sometimes, only Map or Reduce use thses
cached files, not useful for both. but TaskTracker always download cached files from HDFS,
if there are some little bit big files in cache, it's time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like the following
two methods:
> 1) 
> hadoop job **** -files file1 -mapfiles file2 -reducefiles file3 -archives arc1 -maparchives
arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message