Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FF0D428F for ; Tue, 21 Jun 2011 20:32:11 +0000 (UTC) Received: (qmail 16265 invoked by uid 500); 21 Jun 2011 20:32:10 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 16210 invoked by uid 500); 21 Jun 2011 20:32:10 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 16202 invoked by uid 99); 21 Jun 2011 20:32:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 20:32:10 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 20:32:03 +0000 Received: by qwj9 with SMTP id 9so112199qwj.35 for ; Tue, 21 Jun 2011 13:31:43 -0700 (PDT) Received: by 10.229.49.133 with SMTP id v5mr5563068qcf.165.1308688303105; Tue, 21 Jun 2011 13:31:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.105.7 with HTTP; Tue, 21 Jun 2011 13:31:23 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Wed, 22 Jun 2011 02:01:23 +0530 Message-ID: Subject: Re: Large startup time in remote MapReduce job To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Gabor, If your jar does not contain code changes that need to get transmitted every time, you can consider placing them on the JT/TT classpaths and not do any jar registration in your job submission code. You'll see a related WARN but it should be OK to ignore that. If not, work on other ways to get your jar size reduced. Does it really contain 20 MB worth of user code or is that with libraries? On Wed, Jun 22, 2011 at 1:57 AM, Gabor Makrai wrote: > Hi everyone, > > I have a little problem with running MapReduce jobs. > I have a pretty large Java program (my jar size is more than 20MB) , where I > implemented a MapReduce job. I tested it in my local cluster, and it worked > fine. But I tried it with low-bandwith Internet access and I experienced > very-very slow job starting time :( I guess my whole JAR file was uploaded, > because I experienced unusual upgoing network traffic. > Could anyone tell me how can I solve this problem? > > Thanks, > Gabor > -- Harsh J