Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1AB144EA8 for ; Tue, 21 Jun 2011 20:58:42 +0000 (UTC) Received: (qmail 62135 invoked by uid 500); 21 Jun 2011 20:58:40 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 62079 invoked by uid 500); 21 Jun 2011 20:58:40 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 62071 invoked by uid 99); 21 Jun 2011 20:58:40 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 20:58:40 +0000 Received: from localhost (HELO awittena-md.linkedin.biz) (127.0.0.1) (smtp-auth username aw, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 20:58:40 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Large startup time in remote MapReduce job From: Allen Wittenauer In-Reply-To: Date: Tue, 21 Jun 2011 13:58:38 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <12015E99-0D45-4837-9C68-69148A8C32E4@apache.org> References: To: X-Mailer: Apple Mail (2.1082) On Jun 21, 2011, at 1:31 PM, Harsh J wrote: > Gabor, >=20 > If your jar does not contain code changes that need to get transmitted > every time, you can consider placing them on the JT/TT classpaths ... which means you get to bounce your system every time you = change code. > and > not do any jar registration in your job submission code. You'll see a > related WARN but it should be OK to ignore that. >=20 > If not, work on other ways to get your jar size reduced. Does it > really contain 20 MB worth of user code or is that with libraries? Harsh is on the right track. Break your jar up into multiple chunks, putting the fairly = static pieces into a distributed cache. See = http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C= _static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F for more = info.