hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rab ra <rab...@gmail.com>
Subject Hadoop streaming - Class not found
Date Wed, 23 Jul 2014 15:04:38 GMT

I am trying to run an executable using hadoop streaming 2.4

My executable is my mapper which is a groovy script. This script uses a
class from a jar file which I am sending via -libjars argument.

The hadoop streaming is made to span maps via an input file, each line
feeds to one map.

The question is, though the hadoop successfully executes the use case, but,
I see that some maps failed and restarted later. The failure was due to
failing to locate the class. The script has some imports and they are not
found. However, they are all in jar file.

I am tempted to think that when hadoop executes the first few map tasks,
the jar file is not "prepared yet" to be made available to maps and hence
the initial maps failed to locate the class, and later, when they are
restarted, it is able to locate the class and executes smoothly.

Is this correct? If not, can someone tell me why this behavior? How can I
get around this issue? Because of this, the use case takes little more time
to execute. I fear, when I expand the use case, this will surely cause
performance delay.

with regards

View raw message