hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Type mismatch in key from map
Date Thu, 24 Dec 2009 03:31:10 GMT
I think you meant.. KeyValueTextInputFormat.  This is in a deprecated
package.  It even uses JobConf that's been deprecated.  Is there an
equivalent new class that's not deprecated?  Otherwise, I will have to
create the JobConf object just to use this class.

In any case, I tried using it as follows (also a few other variations...)

    JobConf jobConf = new JobConf();
    jobConf.setWorkingDirectory(new Path("myapp/output"));
    KeyValueTextInputFormat.addInputPath(jobConf, new
Path("./part-r-00000"));

But it keeps throwing this exception...

Exception in thread "main" java.io.IOException: No input paths specified in
job
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)


What's the right way to use this class?  Thanks for your help.


On Wed, Dec 23, 2009 at 6:16 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> It seems the value type of your first job's output is Text, but I guess
> your
> second job's InputFormat is TextInputFormat, the key type of
> TextInputFormat
> is LongWritable. So you will get the Type mismatch error message. I suggest
> you use KeyValueInputFormat as your second job's InputFormat.
>
>
> Jeff Zhang
>
>
> On Wed, Dec 23, 2009 at 4:22 PM, Something Something <
> mailinglists19@gmail.com> wrote:
>
> > I would like to feed a file created by one job as an input to the next
> job.
> >  When I do that, I get:
> >
> > java.io.IOException: Type mismatch in key from map: expected
> > org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> >  at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> > at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
> >  at
> >
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
> >  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> > The first job does:  context.write(key, value) - in a loop.  This creates
> a
> > file (<output dir>/part-r-00000) that contains something like this...
> >
> > 1 1,2,4*6*,1**
> > 1 2,2,6*,4**
> > 2 1,6,2*3*5*6*7*8*,1**
> > 2 2,6,3*5*6*7*8*,2**
> > & so on...
> >
> > Now in my second job I do:
> >
> > FileInputFormat.addInputPath(job, new Path(inFile));
> >
> > Where inFile is set to the one created above (<output dir>/part-r-00000)
> >
> >
> > What am I doing wrong?  Please help.  Thanks.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message