Hello, do I have to point the "segment parameter" of the readseg command to the "timestamp directory"? hadoop dfs -ls crawl/segments/20111130041413/ Found 7 items -rw-r--r-- 2 fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/_SUCCESS drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/content drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_fetch drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_generate drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_parse drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/parse_data drwxr-xr-x - fwp supergroup 0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/parse_text like this? nutch readseg -list crawl/segments/20111130041413/ I got still the same exception for every segment. Exception in thread "main" java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:93) at org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:455) at org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433) at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) But the solrindex command works for all the segments, except one. I'm not sure about the usage of the segment reader. greetz, Rafael. - On 1/Dec/ 2011, at 11:02 , Markus Jelsma wrote: > Seems corrupt or empty. Use segment reader to check out the segment. > > On Wednesday 30 November 2011 23:31:53 Rafael Pappert wrote: >> Hello List, >> >> I try to index my parsed into solr with the solrindex command, >> but on one segment I got the following exception: >> >> java.io.EOFException >> at java.io.DataInputStream.readFully(DataInputStream.java:180) >> at java.io.DataInputStream.readFully(DataInputStream.java:152) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecor >> dReader.java:43) at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF >> ileInputFormat.java:63) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:1 >> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) >> >> java.io.EOFException >> at java.io.DataInputStream.readFully(DataInputStream.java:180) >> at java.io.DataInputStream.readFully(DataInputStream.java:152) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecor >> dReader.java:43) at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF >> ileInputFormat.java:63) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:1 >> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) >> >> java.io.EOFException >> at java.io.DataInputStream.readFully(DataInputStream.java:180) >> at java.io.DataInputStream.readFully(DataInputStream.java:152) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecor >> dReader.java:43) at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF >> ileInputFormat.java:63) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:1 >> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) >> >> java.io.EOFException >> at java.io.DataInputStream.readFully(DataInputStream.java:180) >> at java.io.DataInputStream.readFully(DataInputStream.java:152) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecor >> dReader.java:43) at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF >> ileInputFormat.java:63) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:1 >> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) >> >> Whats wrong with this segment? All of the other commands worked without any >> problem. >> >> Best regards, >> Rafael. > > -- > Markus Jelsma - CTO - Openindex