hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-12406) AbstractMapWritable.readFields throws ClassNotFoundException with custom writables
Date Sun, 10 Apr 2016 20:37:25 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated HADOOP-12406:
---------------------------------------------
    Assignee: Nadeem Douba
      Status: Open  (was: Patch Available)

Hi [~ndouba],

I'm about to do a 2.7.3 Apache Hadoop release and finally got around to this again.

h4. Analysis
To make progress, I had to read up a bit on nutch and about how to run this so that I can
reproduce the bug in order to rationalize your patch. I finally succeeded in doing so! Tested
this with 2.7.2 release and nutch 1.11 and using the URL feed [given at NUTCH-1084|https://issues.apache.org/jira/browse/NUTCH-1084?focusedCommentId=13882771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882771]
{code}
~/tmp/common/hadoop-common-2.7.2/bin/hadoop jar apache-nutch-1.11.job org.apache.nutch.crawl.CrawlDbReader
file:///tmp/nutch/apache-nutch-1.11/runtime/local/crawl/crawldb/ -url http://bappenas.go.id/
{code}

I can reproduce all the problems listed at NUTCH-1084 - with readdb, MR local-job-runner based
job for crawling etc.

The real issue is that Nutch's readdb is client-only and *not* running a MapReduce job which
was my question before. For regular MR jobs, the job-jar *is* on the system class-loader.
For the client-only invocations using "hadoop jar" and local-job-runner, the job-jar is actually
*not* on the system-classpath - that is why you are running into the issue.

h4. Summary
Your patch looks good to me. Clearly, the thread context-loader falls back to system class-loader
where it is not overridden - so we are fine for all the ways of loading the classes in readFields.

I'll resubmit your patch with minor commenting related changes to Jenkins and commit if Mr.Jenkins
is also fine.

> AbstractMapWritable.readFields throws ClassNotFoundException with custom writables
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-12406
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12406
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.7.1
>         Environment: Ubuntu Linux 14.04 LTS amd64
>            Reporter: Nadeem Douba
>            Assignee: Nadeem Douba
>            Priority: Blocker
>              Labels: bug, hadoop, io, newbie, patch-available
>         Attachments: HADOOP-12406.patch
>
>
> Note: I am not an expert at JAVA, Class loaders, or Hadoop. I am just a hacker. My solution
might be entirely wrong.
> AbstractMapWritable.readFields throws a ClassNotFoundException when reading custom writables.
Debugging the job using remote debugging in IntelliJ revealed that the class loader being
used in Class.forName() is different than that used by the Thread's current context (Thread.currentThread().getContextClassLoader()).
The class path for the system class loader does not include the libraries of the job jar.
However, the class path for the context class loader does. The proposed patch changes the
class loading mechanism in readFields to use the Thread's context class loader instead of
the system's default class loader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message