Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 74418 invoked from network); 27 Feb 2007 18:38:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Feb 2007 18:38:32 -0000 Received: (qmail 90937 invoked by uid 500); 27 Feb 2007 18:38:36 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 90903 invoked by uid 500); 27 Feb 2007 18:38:36 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 90886 invoked by uid 99); 27 Feb 2007 18:38:36 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Feb 2007 10:38:36 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Feb 2007 10:38:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BCB2B714049 for ; Tue, 27 Feb 2007 10:38:05 -0800 (PST) Message-ID: <30538881.1172601485770.JavaMail.jira@brutus> Date: Tue, 27 Feb 2007 10:38:05 -0800 (PST) From: "Gautam Kowshik (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1032) Support for caching Job JARs In-Reply-To: <552500.1172217185547.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kowshik updated HADOOP-1032: ----------------------------------- Fix Version/s: 0.12.0 Status: Patch Available (was: Open) putting up first patch to be reviewed.. have added api to add jars to classpath. User is expected to do the following: - upload the jars, once, to a predefined location in DFS - for every job submission, register those jars with the DFS cache using DistributedCache.addCacheArchive() - use conf.addClassPath() or setClassPath() to mark them to be included in the job's classpath Comments? > Support for caching Job JARs > ----------------------------- > > Key: HADOOP-1032 > URL: https://issues.apache.org/jira/browse/HADOOP-1032 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.11.2 > Reporter: Gautam Kowshik > Priority: Minor > Fix For: 0.12.0 > > Attachments: HADOOP-1032.patch > > > Often jobs need to be rerun number of times.. like a job that reads from crawled data time and again.. so having to upload job jars to every node is cumbersome. We need a caching mechanism to boost performance. Here are the features for job specific caching of jars/conf files.. > - Ability to resubmit jobs with jars without having to propagate same jar to all nodes. > The idea is to keep a store(path mentioned by user in job.xml?) local to the task node so as to speed up task initiation on tasktrackers. Assumes that the jar does not change during an MR task. > - An independent DFS store to upload jars to (Distributed File Cache?).. that does not cleanup between jobs. > This might need user level configuration to indicate to the jobclient to upload files to DFSCache instead of the DFS. https://issues.apache.org/jira/browse/HADOOP-288 facilitates this. Our local cache can be client to the DFS Cache. > - A standard cache mechanism that checks for changes in the local store and picks from dfs if found dirty. > This does away with versioning. The DFSCache supports a md5 checksum check, we can use that. > Anything else? Suggestions? Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.