Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B895C10907 for ; Wed, 23 Jul 2014 15:05:12 +0000 (UTC) Received: (qmail 37905 invoked by uid 500); 23 Jul 2014 15:05:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 37800 invoked by uid 500); 23 Jul 2014 15:05:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37781 invoked by uid 99); 23 Jul 2014 15:05:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jul 2014 15:05:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rabmdu@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jul 2014 15:05:03 +0000 Received: by mail-vc0-f169.google.com with SMTP id hu12so2433379vcb.14 for ; Wed, 23 Jul 2014 08:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=paEPcfQvWvMVEoC0jNbuqGwD1rFLmIpKEcbf80U1piU=; b=D38rRFmKRJKt6WQiIZfYIxWX6UElf1/skHuFFDf+E+WW0nXnYhqjNfkdrvSeg+2h5o 1GMOzxZAohPALFv773Hbg3Msjok5pf/HPwlzB0xbhynM7BiM8PaSzxJIXDR8ZQ6gv2Pq /PwmOw4S5U5mTJV8oex94aht1RJptcPOiLOhLQ9mYkrpIcQiyvSbZ8YZCduqb7mCX+4X muyNUobP4N4Eki2QJRFhuwFHPKuwdV8OcwaXGyX8PshdxKOIx89N5n1Zfy/q1FOng3Ft F7jB2HdbnCUfgqMH6kAdnWQ/E3gzF73JA1wa6/F8Rw2+5VrLGE+Cu8g3RGDkXjOhEWbs mQeQ== MIME-Version: 1.0 X-Received: by 10.221.34.13 with SMTP id sq13mr3190767vcb.16.1406127878465; Wed, 23 Jul 2014 08:04:38 -0700 (PDT) Received: by 10.220.173.2 with HTTP; Wed, 23 Jul 2014 08:04:38 -0700 (PDT) Date: Wed, 23 Jul 2014 20:34:38 +0530 Message-ID: Subject: Hadoop streaming - Class not found From: rab ra To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11364a64b4708504fedda8ae X-Virus-Checked: Checked by ClamAV on apache.org --001a11364a64b4708504fedda8ae Content-Type: text/plain; charset=UTF-8 Hello, I am trying to run an executable using hadoop streaming 2.4 My executable is my mapper which is a groovy script. This script uses a class from a jar file which I am sending via -libjars argument. The hadoop streaming is made to span maps via an input file, each line feeds to one map. The question is, though the hadoop successfully executes the use case, but, I see that some maps failed and restarted later. The failure was due to failing to locate the class. The script has some imports and they are not found. However, they are all in jar file. I am tempted to think that when hadoop executes the first few map tasks, the jar file is not "prepared yet" to be made available to maps and hence the initial maps failed to locate the class, and later, when they are restarted, it is able to locate the class and executes smoothly. Is this correct? If not, can someone tell me why this behavior? How can I get around this issue? Because of this, the use case takes little more time to execute. I fear, when I expand the use case, this will surely cause performance delay. with regards rab --001a11364a64b4708504fedda8ae Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello,

I am trying to run an executable= using hadoop streaming 2.4

My executable is my ma= pper which is a groovy script. This script uses a class from a jar file whi= ch I am sending via -libjars argument.

The hadoop streaming is made to span maps via an input = file, each line feeds to one map.=C2=A0

The questi= on is, though the hadoop successfully executes the use case, but, I see tha= t some maps failed and restarted later. The failure was due to failing to l= ocate the class. The script has some imports and they are not found. Howeve= r, they are all in jar file.=C2=A0

I am tempted to think that when hadoop executes the fir= st few map tasks, the jar file is not "prepared yet" to be made a= vailable to maps and hence the initial maps failed to locate the class, and= later, when they are restarted, it is able to locate the class and execute= s smoothly.

Is this correct? If not, can someone tell me why this b= ehavior? How can I get around this issue? Because of this, the use case tak= es little more time to execute. I fear, when I expand the use case, this wi= ll surely cause performance delay.


with regards
rab
--001a11364a64b4708504fedda8ae--