Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1082)
Subject: Re: Large startup time in remote MapReduce job
From: Allen Wittenauer <aw@apache.org>
In-Reply-To: <BANLkTimSm==X+=9VnHyX+oU-zknzjKRzDA@mail.gmail.com>
Date: Tue, 21 Jun 2011 13:58:38 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <12015E99-0D45-4837-9C68-69148A8C32E4@apache.org>
References: <BANLkTik91dkWnrWyVGp3HnpnZLFHTQjhKQ@mail.gmail.com>
 <BANLkTimSm==X+=9VnHyX+oU-zknzjKRzDA@mail.gmail.com>
To: <general@hadoop.apache.org>


On Jun 21, 2011, at 1:31 PM, Harsh J wrote:

> Gabor,
>=20
> If your jar does not contain code changes that need to get transmitted
> every time, you can consider placing them on the JT/TT classpaths

	... which means you get to bounce your system every time you =
change code.


> and
> not do any jar registration in your job submission code. You'll see a
> related WARN but it should be OK to ignore that.
>=20
> If not, work on other ways to get your jar size reduced. Does it
> really contain 20 MB worth of user code or is that with libraries?

	Harsh is on the right track.

	Break your jar up into multiple chunks, putting the fairly =
static pieces into a distributed cache.  See =
http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C=
_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F for more =
info.