nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armel T. Nene" <armel.n...@idna-solutions.com>
Subject RE: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue
Date Tue, 13 Feb 2007 23:45:28 GMT
Dennis

I was wondering if this patch could fix my problem which is, if not the
same, very similar to this one. I am using Nutch 0.8.2-dev, I have made
checkout awhile ago from SVN but never updated again. I was able to crawl
10000 xml files before with no error whatsoever. This is the following
errors that I get when I'm fetching:

INFO parser.custom: Custom-parse: Parsing content
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf
07/02/12 22:09:16 INFO fetcher.Fetcher: fetch of
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf failed with:
java.lang.NullPointerException
07/02/12 22:09:17 INFO mapred.LocalJobRunner: 0 pages, 0 errors, 0.0
pages/s, 0 kb/s, 
07/02/12 22:09:17 FATAL fetcher.Fetcher: java.lang.NullPointerException
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232)
07/02/12 22:09:17 FATAL fetcher.Fetcher: fetcher
caught:java.lang.NullPointerException

One of the problem is that my hadoop version says the following:
hadoop-0.4.0-patched. Now I don't know if it means that I am running the
0.4.0 version but it seems a little bit confusing. Once you can clarify that
for me, then I will be able to apply the patch to my version. 

Best Regards,

Armel

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: 13 February 2007 21:09
To: nutch-dev@lucene.apache.org
Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Actually I take it back.  I don't think it is the same problem but I do 
think it is the right solution.

Dennis Kubes

Dennis Kubes wrote:
> This has to do with HADOOP-964.  Replace the jar files in your Nutch 
> versions with the most recent versions from Hadoop.  You will also need 
> to apply NUTCH-437 patch to get Nutch to work with the most recent 
> changes to the Hadoop codebase.
> 
> Dennis Kubes
> 
> Gal Nitzan wrote:
>> Hi,
>>
>> Does anybody uses Nutch trunk?
>>
>> I am running nutch 0.9 and unable to fetch.
>>
>> after 50-60K urls I get NPE in
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>>
>> I was wandering if anyone have a work around or maybe something is 
>> wrong with
>> my setup.
>>
>> I have opened a new issue in jira
>> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>>
>> Any clue?
>>
>> Gal
>>
>>

-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007
13:23
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007
13:23
 


Mime
View raw message