nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "carmmello@globo.com" <carmme...@globo.com>
Subject Index fails
Date Thu, 05 May 2005 18:12:16 GMT
I am trying to crawl, do depth 4, about 300 sites.  All the time, when
the segment 4 (nutch creates 4 segments, the last the biger one), I got
the following error message:


"050505 141422 found resource common-terms.utf8 at
file:/usr/local/nutch-nightly/conf/common-terms.utf8
050505 141901  Processed 20000 records (71.39339 rec/s)
050505 142403  Processed 40000 records (66.24074 rec/s)
050505 143005  Processed 60000 records (55.29903 rec/s)

050505 144953  Processed 120000 records (16.83333 rec/s)
050505 145726  Processed 140000 records (44.17517 rec/s)
Exception in thread "main"
java.io.FileNotFoundException: /mnt/C/maio_4/segments/20050503220741/index/_2r7e.prx (No space
left on device)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
        at
org.apache.lucene.store.FSOutputStream.<init>(FSDirectory.java:461)
        at org.apache.lucene.store.FSDirectory.createFile
(FSDirectory.java:263)
        at org.apache.lucene.index.SegmentMerger.mergeTerms
(SegmentMerger.java:248)
        at org.apache.lucene.index.SegmentMerger.merge
(SegmentMerger.java:93)
        at org.apache.lucene.index.IndexWriter.mergeSegments
(IndexWriter.java:487)
        at org.apache.lucene.index.IndexWriter.maybeMergeSegments
(IndexWriter.java:458)
        at org.apache.lucene.index.IndexWriter.addDocument
(IndexWriter.java:310)
        at org.apache.lucene.index.IndexWriter.addDocument
(IndexWriter.java:294)
        at org.apache.nutch.indexer.IndexSegment.indexPages
(IndexSegment.java:148)
        at org.apache.nutch.indexer.IndexSegment.main
(IndexSegment.java:254)
[root@localhost nutch-nightly]#

 Of course it is not lack of hardware space.  So, what is going on?

Thanks,

Wilson Melo



Mime
View raw message