hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Antoine DuBoDeNa <pad...@gmail.com>
Subject Re: help in distribution of a task with hadoop
Date Mon, 13 Aug 2012 18:27:01 GMT
We have all documents moved to HDFS. I understand with our 1st option we
need more I/O than what you say but let's say that's not a problem for now.

Could you please point me on 2) option? how could we do that? any tutorial
or example?

Thanks

2012/8/13 Bertrand Dechoux <dechouxb@gmail.com>

> 1) A standard way of doing it would be to have all your files content
> inside HDFS. You could then process <key,value> where key would be the name
> of the file and value its contents. It would improve performance : data
> locality, less network traffic... But you may have constraints...
>
> 2) Maven is a simple way of doing it.
>
> Regards
>
> Bertrand
>
> On Mon, Aug 13, 2012 at 7:59 PM, Pierre Antoine DuBoDeNa
> <padbdn@gmail.com>wrote:
>
> > Hello,
> >
> > We use hadoop to distribute a task over our machines.
> >
> > This task requires only the mapper class to be defined. We want to do
> some
> > text processing in thousands of documents. So we create key-value pairs,
> > where key is just an increasing number and value is the path of the file
> to
> > be processed.
> >
> > We face problem on including an external jar file/class while running a
> jar
> > file.
> >
> > $ mkdir Rdg_classes
> >  $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
> > Rdg_classes Rdg.java
> > $ jar -cvf Rdg.jar -C Rdg_classes/ .
> > We have tried the following options:
> >
> > *1. Set HADOOP_CLASSPATH with the location of external jar files or
> > external classes.*
> > It doesnt help. Instead, it starts de-recognizing the Reducer with below
> > error:
> >
> > java.lang.RuntimeException: java.lang.RuntimeException:
> > java.lang.ClassNotFoundException: hadoop.Rdg$Reduce
> >     at
> > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899)
> >     at
> org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1028)
> >     at
> org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1380)
> >     at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:981)
> >     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> >     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:396)
> >     at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >     at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > hadoop.Rdg$Reduce
> >     at
> > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
> >     at
> > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891)
> >     ... 10 more
> > Caused by: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce
> >     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >     at java.lang.Class.forName0(Native Method)
> >     at java.lang.Class.forName(Class.java:247)
> >     at
> >
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> >     at
> > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
> >     ... 11 more
> >
> > *2. Use -libjars option as below:*
> > hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output
> >
> > Where Rdg_lib is the a folder containing all reqd classes/jars stored on
> > HDFS.
> > But it starts reading -libjars as an input as gives error as:
> >
> > 12/08/10 08:16:24 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:hduser
> > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > exist: hdfs://nameofserver:54310/user/hduser/-libjars
> > Exception in thread "main"
> org.apache.hadoop.mapred.InvalidInputException:
> > Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars
> >
> > Is there any other way to do it? or we do anything wrong?
> >
> > Best,
> >
>
>
>
> --
> Bertrand Dechoux
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message