Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 94444 invoked from network); 1 Feb 2007 03:59:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Feb 2007 03:59:31 -0000 Received: (qmail 40378 invoked by uid 500); 1 Feb 2007 03:59:33 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 40312 invoked by uid 500); 1 Feb 2007 03:59:33 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 40301 invoked by uid 99); 1 Feb 2007 03:59:33 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jan 2007 19:59:33 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jan 2007 19:59:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 119067142E1 for ; Wed, 31 Jan 2007 19:59:06 -0800 (PST) Message-ID: <32857566.1170302346069.JavaMail.jira@brutus> Date: Wed, 31 Jan 2007 19:59:06 -0800 (PST) From: "Dennis Kubes (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-964) Hadoop Shell Script causes ClassNotFoundException for Nutch processes In-Reply-To: <23361462.1170287345512.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469327 ] Dennis Kubes commented on HADOOP-964: ------------------------------------- The second patch classpath2.path (sorry should be patch) attacks the problem from the ReduceTaskRunner instead of the hadoop shell script. The problem is that Writable classes are not being found by the ReduceTaskRunner upon initialization. It needs these Writable classes to perform sorting, etc in the prepare stage. The first solution was to change the hadoop script to load any jars in the HADOOP_HOME. The hadoop script sets the classpath for the TaskTracker which is then passed to the ReduceTaskRunner and therefore by loading any jars in the home directory the necessary jars would be in the classpath and accessible. There are a few issues with that fix. First this reverses HADOOP-700 which we don't want to do. Second is we went down this path of setting classpath through the script for Writable classes then anytime new classes were added we would have to restart the TaskTracker nodes. Again not a good solution. So instead what I did with this patch is to change the ReduceTaskRunner to dynamically configure it classpath from the local unjarred work directory. It does this through creating a new URLClassLoader and adding in the same elements that are added to classpath of the TaskTracker$Child spawns while keeping the old context class loader as its parent. The new URLClassLoader is then set into the current JobConf as its classloader and it is used for the sorting, etc. This allows us to not have to change the hadoop script and two allows new writable classes to by dynamically added to the system without restarting TaskTracker nodes. I have run this patch on a development system using the Nutch injector as well as ran the TestMapRed unit tests. Both completed sucessfully. > Hadoop Shell Script causes ClassNotFoundException for Nutch processes > --------------------------------------------------------------------- > > Key: HADOOP-964 > URL: https://issues.apache.org/jira/browse/HADOOP-964 > Project: Hadoop > Issue Type: Bug > Components: scripts > Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems > Reporter: Dennis Kubes > Priority: Critical > Fix For: 0.11.0 > > Attachments: classpath.patch, classpath2.path > > > In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException. > I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.