lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Exception when crawl trys to finish...
Date Wed, 07 Sep 2005 14:01:01 GMT
You should email nutch-user@lucene... list, where Nutch users "hang
out".

Otis


--- Christian Aschoff <christian.aschoff@uni-ulm.de> wrote:

> Hi,
> 
> after three days of crwling the intranet, the nutch crawler throwed  
> an exception :-(
> 
> It seems that the crawler wants to do something with the .DS_store- 
> File from Mac OS X an he does not know how to handle it?
> 
> Can i re-initiate the clean-up without crawling the intranet again?
> 
> Regards,
> Christian
> 
> 050906 184831 Processing document 29000
> 050906 184833 Processing document 30000
> 050906 184835 Finishing update
> 050906 184845 Processing pagesByURL: Sorted 288601 instructions in  
> 9.954 seconds.
> 050906 184845 Processing pagesByURL: Sorted 28993.46996182439  
> instructions/second
> 050906 184856 Processing pagesByURL: Merged to new DB containing  
> 181590 records in 9.095 seconds
> 050906 184856 Processing pagesByURL: Merged 19965.915338097853  
> records/second
> 050906 184857 Processing pagesByMD5: Sorted 76461 instructions in  
> 1.199 seconds.
> 050906 184857 Processing pagesByMD5: Sorted 63770.64220183486  
> instructions/second
> 050906 184904 Processing pagesByMD5: Merged to new DB containing  
> 181590 records in 5.738 seconds
> 050906 184904 Processing pagesByMD5: Merged 31646.91530149878
> records/ 
> second
> 050906 184911 Processing linksByMD5: Sorted 286132 instructions in  
> 7.354 seconds.
> 050906 184911 Processing linksByMD5: Sorted 38908.34919771553  
> instructions/second
> 050906 184940 Processing linksByMD5: Merged to new DB containing  
> 1060091 records in 27.791 seconds
> 050906 184940 Processing linksByMD5: Merged 38145.11892339247
> records/ 
> second
> 050906 184943 Processing linksByURL: Sorted 145747 instructions in  
> 3.082 seconds.
> 050906 184943 Processing linksByURL: Sorted 47289.74691758599  
> instructions/second
> 050906 185014 Processing linksByURL: Merged to new DB containing  
> 1060091 records in 29.113 seconds
> 050906 185014 Processing linksByURL: Merged 36412.977020575 records/ 
> second
> 050906 185017 Processing linksByMD5: Sorted 181123 instructions in  
> 2.968 seconds.
> 050906 185017 Processing linksByMD5: Sorted 61025.26954177897  
> instructions/second
> 050906 185045 Processing linksByMD5: Merged to new DB containing  
> 1060091 records in 26.092 seconds
> 050906 185045 Processing linksByMD5: Merged 40628.96673309827
> records/ 
> second
> 050906 185234 Update finished
> 050906 185235 Updating /Users/caschoff/Desktop/nutch-0.7/ 
> crawl.uni.test/segments from /Users/caschoff/Desktop/nutch-0.7/ 
> crawl.uni.test/db
> 050906 185235  reading /Users/caschoff/Desktop/nutch-0.7/ 
> crawl.uni.test/segments/.DS_Store
> Exception in thread "main" java.io.FileNotFoundException: /Users/ 
> caschoff/Desktop/nutch-0.7/crawl.uni.test/segments/.DS_Store/fetcher/
> 
> data
>          at org.apache.nutch.fs.LocalFileSystem.open 
> (LocalFileSystem.java:93)
>          at org.apache.nutch.io.SequenceFile$Reader.<init> 
> (SequenceFile.java:194)
>          at org.apache.nutch.io.SequenceFile$Reader.<init> 
> (SequenceFile.java:187)
>          at
> org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:190)
>          at
> org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:179)
>          at org.apache.nutch.io.ArrayFile$Reader.<init> 
> (ArrayFile.java:50)
>          at org.apache.nutch.tools.UpdateSegmentsFromDb.addSegment 
> (UpdateSegmentsFromDb.java:197)
>          at org.apache.nutch.tools.UpdateSegmentsFromDb.run 
> (UpdateSegmentsFromDb.java:182)
>          at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:147)
> 
> [2]-  Exit 1                  bin/nutch crawl urls -dir  
> crawl.uni.test -depth 10 1>&crawl.log
> 
> ---
> Dipl. Ing. (FH) Christian Aschoff
> 
> Büro:
> Universität Ulm/KIZ
> Raum O26/5403
> 
> Tel. 0731 50-22432
> christian.aschoff@uni-ulm.de
> 
> Privat:
> Fabristr. 13
> 89075 Ulm
> Deutschland/Old Europe
> 
> Tel. 0731 60280360
> Fax. 0731 60280361
> caschoff@mac.com
> 
> Helfen Sie mit: www.meyers-konversationslexikon.de
> 
> 
> 


Mime
View raw message