hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: -libjars?
Date Thu, 15 Sep 2011 11:44:42 GMT
Ok, but does the job even start the maps, or does it fail during initial setup?

The reason I ask is libjars only adds the jar to the classpath for the
mappers and reducers. If you need the class before the job is
submitted to the cluster, you should do something like this:

HADOOP_CLASSPATH=../umd-hadoop-core/cloud9.jar hadoop jar myjob.jar
myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
home/my/pyworkspace/openAnc.xml index/ 10 1

-Joey

On Thu, Sep 15, 2011 at 4:24 AM, Marco Didonna <m.didonna86@gmail.com> wrote:
> Right now I am still in standalone mode ... I'd like to fix this issue
> before starting a cluster on EC2. :)
>
> Thanks for your time
>
> Marco
>
> On 14 September 2011 14:04, Joey Echeverria <joey@cloudera.com> wrote:
>> When are you getting the exception? Is it during the setup of your
>> job, or after it's running on the cluster?
>>
>> -Joey
>>
>> On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m.didonna86@gmail.com> wrote:
>>> Hello everyone,
>>> sorry to bring this up again but I need some clarification. I wrote a
>>> map-reduce application that need cloud9 library
>>> (https://github.com/lintool/Cloud9). This library is packet in a jar
>>> file and I want to make it available to the whole cluster. So far I
>>> have been working in standalone mode and I have unsuccessfully tried
>>> to use the -libjars options. I always get ClassNotDefException: the
>>> only way I made everything work fine is by copying the cloud9.jar into
>>> hadoop/lib folder.
>>> I suppose I cannot do it when using a cluster of N machines since I
>>> would have to copy it on the N machines and this approach isn't
>>> feasible.
>>>
>>> Here's how I perform the job "hadoop jar myjob.jar
>>> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
>>> home/my/pyworkspace/openAnc.xml index/ 10 1"
>>>
>>> Is there some code that needs to be written in the driver in order to
>>> have the darn library added to the "global" classpath? This -libjars
>>> option is really poor documented IMHO.
>>>
>>> Any help would be very much appreciated ;)
>>>
>>> Marco Didonna
>>>
>>> On 17 August 2011 03:57, Anty <anty.rao@gmail.com> wrote:
>>>> Thanks very much , todd. I get it.
>>>>
>>>>
>>>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <todd@cloudera.com> wrote:
>>>>> Putting files on the classpath doesn't make them accessible to JVM's
>>>>> resource loader. If you have dir/foo.properties, then "dir" needs to
>>>>> be on the classpath, not "dir/foo.properties". Since the working dir
>>>>> of the task is on the classpath, then -files works since it gets the
>>>>> properties file into a directory on the classpath.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <anty.rao@gmail.com> wrote:
>>>>>> thanks very much for you reply, todd.
>>>>>> I am at a complete loss. I want to ship a configuration file to the
>>>>>> cluster to run my mapreduce job.
>>>>>>
>>>>>> if I use -libjars option to ship the configuration file, the launched
>>>>>> child JVM created  by task tracker
>>>>>>  can't find the configuration file,curiously, the configuration
file
>>>>>> is already on the classpath of the child JVM.
>>>>>>
>>>>>> if I use -files option to ship the configuration file, the child
JVM
>>>>>> can find the file.
>>>>>> IMO, what's the difference between -libjars and -files  is that
-files
>>>>>> will create a  symbol sink  to the configuration file
>>>>>> in current workding directory of child JVM.
>>>>>>
>>>>>> I dig into the source code,but it's so complicated, i can't figure
out
>>>>>> the root cause of this.
>>>>>> So my question is :
>>>>>> with -libjars option ,the configuration file is already on the
>>>>>> classpath, why classload can't the configuration file ,
>>>>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>>>>
>>>>>> any help will be appreciated.
>>>>>>
>>>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>>> Your "driver" is the program that submits the job. The task is
the
>>>>>>> thing that runs on the cluster. They have separate classpaths.
>>>>>>>
>>>>>>> Better to ask on the public lists if you want a more indepth
explanation
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <anty.rao@gmail.com>
wrote:
>>>>>>>> Hi:Todd
>>>>>>>> Would you please explain a litter more?
>>>>>>>>
>>>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>>>>>
>>>>>>>>> You need to put the library jar on your classpath (eg
using
>>>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it
to the cluster
>>>>>>>>> and put it on the classpath of your task, but not the
classpath of
>>>>>>>>> your "driver" code.
>>>>>>>>>
>>>>>>>> I still can't understand you mean by  " but not the classpath
of
>>>>>>>> your "driver" code."
>>>>>>>>
>>>>>>>> THX advance.
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vipandey@gmail.com>
wrote:
>>>>>>>>> > disclaimer : a newbie!!!
>>>>>>>>> > Howdy?
>>>>>>>>> > Got a quick question. -libjars option doesn't seem
to work for me in -
>>>>>>>>> > prettymuch - my first (or mayby second) mapreduce
job.
>>>>>>>>> > Here's what i'm doing :
>>>>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob
-libjars
>>>>>>>>> >  HStats-1A18.jar input output
>>>>>>>>> >
>>>>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob,
which runs
>>>>>>>>> > just
>>>>>>>>> > fine by itself till I add a dependency on a class
in HStats-1A18.jar.
>>>>>>>>> > When I run the above command with -libjars specified
- it fails to find
>>>>>>>>> > my
>>>>>>>>> > classes that 'are' inside HStats jar file.
>>>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>>> > com/*****/HAgent
>>>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>>> > at java.security.AccessController.doPrivileged(Native
Method)
>>>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>>>> > ... 8 more
>>>>>>>>> >
>>>>>>>>> > My main class is defined as below :
>>>>>>>>> > public class FindSchoolsJob extends Configured implements Tool
{
>>>>>>>>> > :
>>>>>>>>> > public int run(String[] args) throws Exception
{
>>>>>>>>> > :
>>>>>>>>> > :
>>>>>>>>> >               }
>>>>>>>>> > :
>>>>>>>>> > public static void main(String[] args) throws Exception
{
>>>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>>>>> > args);
>>>>>>>>> > System.exit(res);
>>>>>>>>> > }
>>>>>>>>> > }
>>>>>>>>> > Any hint would be highly appreciated.
>>>>>>>>> > Thank You!
>>>>>>>>> > ~V
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>> Anty Rao
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>> Anty Rao
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> Anty Rao
>>>>
>>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message