Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2AD2AF926 for ; Fri, 26 Apr 2013 18:01:39 +0000 (UTC) Received: (qmail 39002 invoked by uid 500); 26 Apr 2013 18:01:34 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 38688 invoked by uid 500); 26 Apr 2013 18:01:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 38678 invoked by uid 99); 26 Apr 2013 18:01:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 18:01:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 18:01:29 +0000 Received: by mail-lb0-f172.google.com with SMTP id d10so1173705lbj.17 for ; Fri, 26 Apr 2013 11:00:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=nhCGlB56lVOiYzKvH53liMP+J1cP6w1QTEoPNpvVgSc=; b=HrTM9R2J60VYF5asYLDUZ5bmXVF1PxloIWcWMBGdUeMw1f+UVFDxLZnjlpdkNakJRJ i+xlNYqZuo0n+tWyr+wkXJnShSZsZYb2rxBtRzbvr6ULeF50vzxlyP4EOyeuK99PklIp iDZzSWEIez3zw+AcTtRq9fdI+6i7bxnaKy+7GXePhz682mQQ0OSsMIBd6MjxTDE0BCL3 hFjg+Z+QwU2U7QtU1ZlHerax/BltcDV/tQN4e9Z3XZ3hIgO9/4kbYplrZfz/BaR2Z7wL vslXM3YBfOgCCNHUYOubs5LMyFPNkxkZGrL5tpBmaelmqjco6ATgkic2ob0rlAwbgY/+ 8low== X-Received: by 10.152.5.134 with SMTP id s6mr23125700las.24.1366999246521; Fri, 26 Apr 2013 11:00:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.1.74 with HTTP; Fri, 26 Apr 2013 11:00:26 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Fri, 26 Apr 2013 11:00:26 -0700 Message-ID: Subject: Re: M/R job optimization To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e013d10107f5f4404db4750e7 X-Gm-Message-State: ALoCoQkhh+SEjWEl7f2WLwLlKUaD4TLkyzsnOosBGvKwFhlILPnHS6tzGxl7hKXCq9cH+lyfKpPG X-Virus-Checked: Checked by ClamAV on apache.org --089e013d10107f5f4404db4750e7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Have you checked the logs? Is there a task that is taking a long time? What is that task doing? There are two basic possibilities: a) you have a skewed join like the other Ted mentioned. In this case, the straggler will be seen to be working on data. b) you have a hung process. This can be more difficult to diagnose, but indicates that there is a problem with your cluster. On Fri, Apr 26, 2013 at 2:21 AM, Han JU wrote: > Hi, > > I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My > questionis that in one of the jobs, map and reduce tasks show 100% finish= ed > in about 1m 30s, but I have to wait another 5m for this job to finish. > This job writes about 720mb compressed data to HDFS with replication > factor 1, in sequence file format. I've tried copying these data to hdfs, > it takes only < 20 seconds. What happened during this 5 more minutes? > > Any idea on how to optimize this part? > > Thanks. > > -- > *JU Han* > > UTC - Universit=E9 de Technologie de Compi=E8gne > * **GI06 - Fouille de Donn=E9es et D=E9cisionnel* > > +33 0619608888 > --089e013d10107f5f4404db4750e7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Have you checked the logs?

Is the= re a task that is taking a long time? =A0What is that task doing?

There are two basic possibilities:

a) you have a skewed join like the other Ted mentioned= . =A0In this case, the straggler will be seen to be working on data.
<= div style>
b) you have a hung process. =A0This can be m= ore difficult to diagnose, but indicates that there is a problem with your = cluster.



On Fri, Apr 26, 2013 at 2:21 AM, Han JU <= ;ju.han.felix@g= mail.com> wrote:
Hi,

I= 9;ve implemented an algorithm with Hadoop, it's a series of 4 jobs. My = questionis that in one of the jobs, map and reduce tasks show 100% finished= in about 1m 30s, but I have to wait another 5m for this job to finish.
This job writes about 720mb compressed data to HDFS with replication f= actor 1, in sequence file format. I've tried copying these data to hdfs= , it takes only < 20 seconds. What happened during this 5 more minutes?<= /div>

Any idea on how to optimize this part?=A0

Thanks.

--
JU Han

UTC=A0=A0= - =A0Universit=E9 de Technologie de Com= pi=E8gne
=A0=A0=A0=A0= GI06 - Fouille de Donn= =E9es et D=E9cisionnel


--089e013d10107f5f4404db4750e7--