hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed
Date Tue, 25 May 2010 05:22:26 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870997#action_12870997

Amareshwari Sriramadasu commented on MAPREDUCE-1505:

bq. All of the o.a.h.mapreduce.Job constructors that don't require the caller to have already
created and supplied a Cluster are deprecated. 
Dick, I did not understand your comment above. Job constructors are deprecated in favor of
static getInstance methods wrt [comment1 |https://issues.apache.org/jira/browse/MAPREDUCE-777?focusedCommentId=12746014&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12746014]
and [comment2 |https://issues.apache.org/jira/browse/MAPREDUCE-777?focusedCommentId=12755973&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12755973]

If the user is passing a Cluster handle, it is fine to initialize it in the constructor. So,
current constructors and getInstance methods look fine. Only if user does not pass Cluster
handle, then we need to create it lazily. 

We can add following method in Job.java which creates Cluster lazily:
public static getInstance(Configuration conf)

Also, will have to change deprecated constructors to create Cluster handle lazily.


> Cluster class should create the rpc client only when needed
> -----------------------------------------------------------
>                 Key: MAPREDUCE-1505
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.2
>            Reporter: Devaraj Das
>            Assignee: Dick King
>             Fix For: 0.22.0
>         Attachments: mapreduce-1505--2010-05-19.patch, MAPREDUCE-1505_yhadoop20.patch,
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the rpc client
object only when needed (when a call to the jobtracker is actually required). org.apache.hadoop.mapreduce.Job
constructs the Cluster object internally and in many cases the application that created the
Job object really wants to look at the configuration only. It'd help to not have these connections
to the jobtracker especially when Job is used in the tasks (for e.g., Pig calls mapreduce.FileInputFormat.setInputPath
in the tasks and that requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the same argument
applies there too.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message