hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel LEMOINE <samuel.lemo...@lingway.com>
Subject ObjectWritable(Document)
Date Thu, 16 Aug 2007 08:36:34 GMT
Hi all
I'm in trouble with ObjectWritable. I'm trying to implement a simple 
indexation with Lucene & Hadoop, and for that I take inspiration from 
nutch code. In the Indexer.java of nutch, line 245, I read:
output.collect(key, new ObjectWritable(doc));

(where doc is a Lucene Document: Document doc = new Document(); line 199 
same file)
So, I try to do the same, but I encounter an error as if ObjectWritable 
couldn't handle Document type:

/opt/java/bin/java -Didea.launcher.port=7539 
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 
-classpath 
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/LuceneScratchPad/classes/test/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/classes/production/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/lucene/lucene-core-2.2.0.jar:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/commons-cli-1.1/commons-cli-1.1.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/hadoop/conf:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-httpclient-3.0.1.jar:/opt/idea-6180/lib/idea_rt.jar

com.intellij.rt.execution.application.AppMain 
com.lingway.proto.lucene.EntryPointHadoop
INFO  apache.hadoop.mapred.FileInputFormat - Total input paths to 
process : 9
INFO  apache.hadoop.mapred.JobClient - Running job: job_myhhdn
INFO  apache.hadoop.mapred.MapTask - numReduceTasks: 1
WARN  apache.hadoop.mapred.LocalJobRunner - job_myhhdn
java.io.IOException: Can't write: 
indexed,tokenized<content:org.apache.lucene.analysis.standard.StandardTokenizer@1e04cbf>

as class org.apache.lucene.document.Field
    at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:157)
    at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:65)
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:365)
    at com.lingway.proto.lucene.MapIndexer.map(MapIndexer.java:35)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:131)
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at 
com.lingway.proto.lucene.EntryPointHadoop.main(EntryPointHadoop.java:36)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1




However, in my code, if I separate the instanciation of the 
ObjectWritable object, the creation doesn't cause any trouble, it's only 
when I try to pass it to the OutputCollector...

Any idea ? why the code of nutch doesn't behave in the same way in my 
project?
(I can't afford to take the time to make nutch run, I'm at the very end 
of my internship so that I'm quite in a hurry :( )

Thanks in advance,

Sam


Mime
View raw message