hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manikandan Saravanan <manikan...@thesocialpeople.net>
Subject Re: Hadoop permissions issue
Date Mon, 06 Jan 2014 13:48:28 GMT
I’m running Nutch 2.2.1 on a Hadoop cluster. I’m running 5000 links from the DMOZ Open
Directory Project. The reduce job stops exactly at 33% all the time and it throws this exception.
From the nutch mailing list, it seems that my job is stumbling upon a repUrl value that’s
null.
-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

On 6 January 2014 at 7:14:41 pm, Devin Suiter RDX (dsuiter@rdx.com) wrote:

Based on the Exception type, it looks like something in your job is looking for a valid value,
and not finding it.

You will probably need to share the job code for people to help with this - to my eyes, this
doesn't appear to be a Hadoop configuration issue, or any kind of problem with how the system
is working.

Are you using Avro inputs and outputs? If your reduce is trying to parse an Avro record, it
may be that the field type is not correct, or maybe there is a reference to an outside schema
object that is not available...

If you provide more information about the context of the error (use case, program goal, code
block, something like that) then it is easier to help you.



Devin Suiter
Jr. Data Solutions Software Engineer

100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan <manikandan@thesocialpeople.net>
wrote:
I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs fine. But
in the next job, this error comes up

java.lang.NullPointerException
at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I’m running three nodes namely nutch1,2,3. The first one’s in the masters file and all
are listed in the slaves file. The /etc/hosts file lists all machines along with their IP
addresses. Can someone help me?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople


Mime
View raw message