Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates
 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BANLkTik91dkWnrWyVGp3HnpnZLFHTQjhKQ@mail.gmail.com>
References: <BANLkTik91dkWnrWyVGp3HnpnZLFHTQjhKQ@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Wed, 22 Jun 2011 02:01:23 +0530
Message-ID: <BANLkTimSm==X+=9VnHyX+oU-zknzjKRzDA@mail.gmail.com>
Subject: Re: Large startup time in remote MapReduce job
To: general@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Gabor,

If your jar does not contain code changes that need to get transmitted
every time, you can consider placing them on the JT/TT classpaths and
not do any jar registration in your job submission code. You'll see a
related WARN but it should be OK to ignore that.

If not, work on other ways to get your jar size reduced. Does it
really contain 20 MB worth of user code or is that with libraries?

On Wed, Jun 22, 2011 at 1:57 AM, Gabor Makrai <makrai.list@gmail.com> wrote:
> Hi everyone,
>
> I have a little problem with running MapReduce jobs.
> I have a pretty large Java program (my jar size is more than 20MB) , where I
> implemented a MapReduce job. I tested it in my local cluster, and it worked
> fine. But I tried it with low-bandwith Internet access and I experienced
> very-very slow job starting time :( I guess my whole JAR file was uploaded,
> because I experienced unusual upgoing network traffic.
> Could anyone tell me how can I solve this problem?
>
> Thanks,
> Gabor
>


-- 
Harsh J