hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-964) Hadoop Shell Script causes ClassNotFoundException for Nutch processes
Date Thu, 01 Feb 2007 03:59:06 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469327
] 

Dennis Kubes commented on HADOOP-964:
-------------------------------------

The second patch classpath2.path (sorry should be patch) attacks the problem from the ReduceTaskRunner
instead of the hadoop shell script.  The problem is that Writable classes are not being found
by the ReduceTaskRunner upon initialization.  It needs these Writable classes to perform sorting,
etc in the prepare stage.  The first solution was to change the hadoop script to load any
jars in the HADOOP_HOME.  The hadoop script sets the classpath for the TaskTracker which is
then passed to the ReduceTaskRunner and therefore by loading any jars in the home directory
the necessary jars would be in the classpath and accessible.  There are a few issues with
that fix.  First this reverses HADOOP-700 which we don't want to do.  Second is we went down
this path of setting classpath through the script for Writable classes then anytime new classes
were added we would have to restart the TaskTracker nodes.  Again not a good solution.

So instead what I did with this patch is to change the ReduceTaskRunner to dynamically configure
it classpath from the local unjarred work directory.  It does this through creating a new
URLClassLoader and adding in the same elements that are added to classpath of the TaskTracker$Child
spawns while keeping the old context class loader as its parent.  The new URLClassLoader is
then set into the current JobConf as its classloader and it is used for the sorting, etc.
 This allows us to not have to change the hadoop script and two allows new writable classes
to by dynamically added to the system without restarting TaskTracker nodes.

I have run this patch on a development system using the Nutch injector as well as ran the
TestMapRed unit tests.  Both completed sucessfully.

> Hadoop Shell Script causes ClassNotFoundException for Nutch processes
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-964
>                 URL: https://issues.apache.org/jira/browse/HADOOP-964
>             Project: Hadoop
>          Issue Type: Bug
>          Components: scripts
>         Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect
all systems
>            Reporter: Dennis Kubes
>            Priority: Critical
>             Fix For: 0.11.0
>
>         Attachments: classpath.patch, classpath2.path
>
>
> In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get
the map output key and value classes from the configuration object.  This is before the TaskTracker$Child
process is spawned off into into own separate JVM so here the classpath for the configuration
is the classpath that started the TaskTracker.  The current hadoop script includes the hadoop
jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars
 so any nutch writable type or any other writable type will not be found and will throw a
ClassNotFoundException.
> I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop
script but it is a good idea to allow jars to be included if they are in specific locations,
such as the HADOOP_HOME where the nutch jar resides.  I have attached a patch that adds any
jars in the HADOOP_HOME directory to the hadoop classpath.  This fixes the issues with getting
ClassNotFoundExceptions inside of Nutch processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message