hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Creating splits/tasks at the client
Date Fri, 29 Sep 2006 07:20:55 GMT
I please correct me if I'm reading the code incorrectly, but it seems
like submitJob puts the submitted job on the jobInitQueue which is
immediately dequeued by the JobInitThread and then initTasks() will get
the file splits and create Tasks. Thus, it doesn't seem like there is
any difference in memory foot print.


Doug Cutting wrote:
> Right, so JobSubmissionProtocol.submitJob(String jobFile) could be
> altered to be submitJob(StringJobFile, Split[]).  The RPC system can
> handle reasonably large values like this, so I don't think that would
> be a problem.  But the memory impact on the JobTracker could become
> significant, since the splits for queued jobs would now be around. 
> This could be mitigated by writing the splits to a temporary file.
> The semantics would be subtly different: if you queue a job now, the
> file listing is done just before the job is executed, not when its
> submitted.  But programs shouldn't rely on that, so I don't think this
> is a big worry.
> Overall, I don't see any major problems with this.  It won't simplify
> things much.  We can remove the code which computes splits in a
> separate thread, but we'd have to add code to store splits to
> temporary files, so codesize is a wash.  And it would remove a
> potential reliability problem.

View raw message