nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)
Date Tue, 30 Aug 2011 13:11:37 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093705#comment-13093705
] 

Ferdy commented on NUTCH-937:
-----------------------------

I finally found out what the problem is with the above suggestion. It was a terrible problem
to debug because of the random elements involved.

Setting the "plugins.folders" to "${job.local.dir}/../jars/plugins" works only in certain
cases. If you have a single folder specified in "mapred.local.dir" there will be no trouble
at all. However, when you have multiple folders specifed (which is a legit thing to do in
Hadoop in order to spread tasks working folders over multiple disks), sometimes loading the
plugins results in a NPE because the plugins folder does not exist.

This is caused by the fact that the jars directory (as unpacked by the TaskTracker) IS NOT
ALWAYS ON THE SAME DISK AS THE WORKING FOLDER. This means for example if you have 2 folders
in "mapred.local.dir" (let's say "/mnt/disk1/mapred,/mnt/disk2/mapred") the jars may be unpacked
in 
"/mnt/disk1/mapred/taskTracker/ferdy/jobcache/job_201108301201_0001/work/../jars/plugins"
but the working directory (wich the "job.local.dir" property is set to) could be
"/mnt/disk2/mapred/taskTracker/ferdy/jobcache/job_201108301201_0001/work/".

Now I'm not sure whether this is a good thing, perhaps it is because most of the time you
will want to unpack a jar once for a job and still run task attempts on multiple disk for
a TaskTracker. It is however very troublesome in cases such as this issue and therefore I
strongly recommend against setting the "plugins.folders" to "${job.local.dir}/../jars/plugins",
unless you only have one folder specified in "mapred.local.dir" of course.

The workaround I am currently using is to put the plugins folder not in the root of the jar,
but in classes/plugins so that Hadoop unjars it and sets it on the classpath automatically.
This way there is no need to change the "mapreduce.job.jar.unpack.pattern" property and "plugins.folders"
can be left to it's default of "plugins". This suggestion requires a slight modification of
Nutch's build.xml file.

> When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce
will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-937
>                 URL: https://issues.apache.org/jira/browse/NUTCH-937
>             Project: Nutch
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.2
>         Environment: hadoop 0.21 or cloudera hadoop 0.20.2+737
>            Reporter: Claudio Martella
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>
> Jobs running in on hadoop 0.21 or cloudera cdh 0.20.2+737 will fail because of missing
plugins (i.e.):
> 10/10/28 12:22:21 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/10/28 12:22:22 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 10/10/28 12:22:23 INFO mapred.JobClient: Running job: job_201010271826_0002
> 10/10/28 12:22:24 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/28 12:22:39 INFO mapred.JobClient: Task Id :
> attempt_201010271826_0002_m_000000_0, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>     at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>     ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 17 more
> Caused by: java.lang.RuntimeException: x point
> org.apache.nutch.net.URLNormalizer not found.
>     at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
>     at
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
>     ... 22 more
> 10/10/28 12:22:40 INFO mapred.JobClient: Task Id :
> attempt_201010271826_0002_m_000001_0, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>     at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>     ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 17 more
> Caused by: java.lang.RuntimeException: x point
> org.apache.nutch.net.URLNormalizer not found.
>     at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
>     at
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
>     ... 22 more
> The bug is due to MAPREDUCE-967 (part of hadoop 0.21 and cdh 0.20.2+737) which modifies
the way MapReduce unpacks the job's jar. The old way was to unpack the whole of it, now only
classes/ and lib/ are unpacked. This way nutch is missing the plugins/ directory.
> A workaround is to force unpacking of the plugin/ directory by setting 'mapreduce.job.jar.unpack.pattern'
configuration to "(?:classes/|lib/|plugins/).*"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message