Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 87550 invoked from network); 29 Sep 2006 07:23:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 29 Sep 2006 07:23:50 -0000 Received: (qmail 11789 invoked by uid 500); 29 Sep 2006 07:23:50 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 11574 invoked by uid 500); 29 Sep 2006 07:23:49 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 11565 invoked by uid 99); 29 Sep 2006 07:23:49 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Sep 2006 00:23:49 -0700 Authentication-Results: idunn.apache.osuosl.org header.from=breed@yahoo-inc.com; domainkeys=good X-ASF-Spam-Status: No, hits=-13.6 required=5.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_WHOIS,USER_IN_DEF_WHITELIST DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 Received: from [207.126.228.149] ([207.126.228.149:38088] helo=rsmtp1.corp.yahoo.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id D5/41-10102-40ACC154 for ; Fri, 29 Sep 2006 00:23:49 -0700 Received: from [10.72.76.89] (snvvpn2-10-72-76-c89.corp.yahoo.com [10.72.76.89]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.6/8.13.6/y.rout) with ESMTP id k8T7MsuD028084 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 29 Sep 2006 00:22:56 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=G5IdC34n0+K9HVc3H7jiVzo2RSKBARl+LjowRWCPX4fN2OeBLGbAGXiIVcL4Eg9E Message-ID: <451CC957.4040505@yahoo-inc.com> Date: Fri, 29 Sep 2006 00:20:55 -0700 From: Benjamin Reed User-Agent: Thunderbird 1.5.0.7 (X11/20060922) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: Creating splits/tasks at the client References: <451B820E.6080101@yahoo-inc.com> <451C10D8.8040800@apache.org> In-Reply-To: <451C10D8.8040800@apache.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I please correct me if I'm reading the code incorrectly, but it seems like submitJob puts the submitted job on the jobInitQueue which is immediately dequeued by the JobInitThread and then initTasks() will get the file splits and create Tasks. Thus, it doesn't seem like there is any difference in memory foot print. ben Doug Cutting wrote: > > Right, so JobSubmissionProtocol.submitJob(String jobFile) could be > altered to be submitJob(StringJobFile, Split[]). The RPC system can > handle reasonably large values like this, so I don't think that would > be a problem. But the memory impact on the JobTracker could become > significant, since the splits for queued jobs would now be around. > This could be mitigated by writing the splits to a temporary file. > > The semantics would be subtly different: if you queue a job now, the > file listing is done just before the job is executed, not when its > submitted. But programs shouldn't rely on that, so I don't think this > is a big worry. > > Overall, I don't see any major problems with this. It won't simplify > things much. We can remove the code which computes splits in a > separate thread, but we'd have to add code to store splits to > temporary files, so codesize is a wash. And it would remove a > potential reliability problem.