hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Häger <martin.ha...@byburt.com>
Subject Re: "Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable" using MultipleInputs (multiple mappers)
Date Mon, 15 Feb 2010 08:17:22 GMT
Indeed it is. We've solved this issue now as well.

2010/2/12 Alex Kozlov <alexvk@cloudera.com>:
> This is a separate issue, did you provide your library jar with the
> `-libjars` flag?
>
> On Fri, Feb 12, 2010 at 2:13 AM, Martin Häger <martin.hager@byburt.com>
> wrote:
>>
>> We do not get the above error when running in pseudo-distributed mode.
>> Instead, we get "java.lang.RuntimeException: readObject can't find
>> class". Any ideas what might be wrong?
>>
>> mtah@thinkpad:~$ hadoop jar /tmp/classify.jar Classify
>> 10/02/12 11:09:48 WARN mapred.JobClient: No job jar file set.  User
>> classes may not be found. See JobConf(Class) or
>> JobConf#setJar(String).
>> 10/02/12 11:09:48 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/02/12 11:09:48 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/02/12 11:09:49 INFO mapred.JobClient: Running job:
>> job_201002121044_0009
>> 10/02/12 11:09:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/02/12 11:10:01 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000000_0, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationActionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:01 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000001_0, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationSessionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:07 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000001_1, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationSessionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:07 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000000_1, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationActionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:13 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000000_2, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationActionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:13 INFO mapred.JobClient: Task Id :
>> attempt_201002121044_0009_m_000001_2, Status : FAILED
>> java.lang.RuntimeException: readObject can't find class
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>        at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:549)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> Classify$TransformationSessionMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
>>        ... 6 more
>>
>> 10/02/12 11:10:22 INFO mapred.JobClient: Job complete:
>> job_201002121044_0009
>> 10/02/12 11:10:22 INFO mapred.JobClient: Counters: 3
>> 10/02/12 11:10:22 INFO mapred.JobClient:   Job Counters
>> 10/02/12 11:10:22 INFO mapred.JobClient:     Launched map tasks=8
>> 10/02/12 11:10:22 INFO mapred.JobClient:     Data-local map tasks=8
>> 10/02/12 11:10:22 INFO mapred.JobClient:     Failed map tasks=1
>>
>>
>> 2010/2/11 Alex Kozlov <alexvk@cloudera.com>:
>> > Try job.setMapOutputKeyClass(JoinKey.class). -- Alex K
>> >
>> > On Thu, Feb 11, 2010 at 8:25 AM, E. Sammer <eric@lifeless.net> wrote:
>> >>
>> >> It looks like you're using the local job runner which does everything
>> >> in a
>> >> single thread. In this case, yes, I think the mappers are run
>> >> sequentially.
>> >> The local job runner is a different code path in Hadoop and is a known
>> >> issue. Have you tried your code in pseudo-distributed mode?
>> >>
>> >> HTH.
>> >>
>> >> On 2/11/10 11:14 AM, Martin Häger wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> We're trying to do a reduce-side join by applying two different
>> >>> mappers (TransformationSessionMapper and TransformationActionMapper)
>> >>> to two different input files and joining them using
>> >>> TransformationReducer. See attached Classify.java for complete source.
>> >>>
>> >>> When running it, we get the following error. JoinKey is our own
>> >>> implementation that is used for performing secondary sort. Somehow
>> >>> TransformationActionMapper gets passed a JoinKey when it expects a
>> >>> LongWritable (TextInputFormat). Is Hadoop actually applying the
>> >>> mappers in sequence?
>> >>>
>> >>> $ hadoop jar /tmp/classify.jar Classify
>> >>> 10/02/11 16:40:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> >>> processName=JobTracker, sessionId=
>> >>> 10/02/11 16:40:16 WARN mapred.JobClient: No job jar file set.  User
>> >>> classes may not be found. See JobConf(Class) or
>> >>> JobConf#setJar(String).
>> >>> 10/02/11 16:40:16 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>> >>> with processName=JobTracker, sessionId= - already initialized
>> >>> 10/02/11 16:40:16 INFO input.FileInputFormat: Total input paths to
>> >>> process : 1
>> >>> 10/02/11 16:40:16 INFO input.FileInputFormat: Total input paths to
>> >>> process : 1
>> >>> 10/02/11 16:40:16 INFO mapred.JobClient: Running job: job_local_0001
>> >>> 10/02/11 16:40:16 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>> >>> with processName=JobTracker, sessionId= - already initialized
>> >>> 10/02/11 16:40:16 INFO input.FileInputFormat: Total input paths to
>> >>> process : 1
>> >>> 10/02/11 16:40:16 INFO input.FileInputFormat: Total input paths to
>> >>> process : 1
>> >>> 10/02/11 16:40:16 INFO mapred.MapTask: io.sort.mb = 100
>> >>> 10/02/11 16:40:16 INFO mapred.MapTask: data buffer = 79691776/99614720
>> >>> 10/02/11 16:40:16 INFO mapred.MapTask: record buffer = 262144/327680
>> >>> 10/02/11 16:40:16 WARN mapred.LocalJobRunner: job_local_0001
>> >>> java.io.IOException: Type mismatch in key from map: expected
>> >>> org.apache.hadoop.io.LongWritable, recieved Classify$JoinKey
>> >>>        at
>> >>>
>> >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
>> >>>        at
>> >>>
>> >>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>> >>>        at
>> >>>
>> >>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>> >>>        at Classify$TransformationActionMapper.map(Classify.java:161)
>> >>>        at Classify$TransformationActionMapper.map(Classify.java:1)
>> >>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> >>>        at
>> >>>
>> >>> org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:51)
>> >>>        at
>> >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> >>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >>>        at
>> >>>
>> >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
>> >>> 10/02/11 16:40:17 INFO mapred.JobClient:  map 0% reduce 0%
>> >>> 10/02/11 16:40:17 INFO mapred.JobClient: Job complete: job_local_0001
>> >>> 10/02/11 16:40:17 INFO mapred.JobClient: Counters: 0
>> >>
>> >>
>> >> --
>> >> Eric Sammer
>> >> eric@lifeless.net
>> >> http://esammer.blogspot.com
>> >
>> >
>
>

Mime
View raw message