hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Type mismatch in key from map
Date Thu, 24 Dec 2009 04:01:25 GMT
The KeyValueTextInputFormat is deprecated in hadoop 0.20.1, and there's a
new one in trunk for new api.
I think you should use part-00000 rather than part-r-00000, then you will
get no IOException, there's output file name change between different hadoop
versions.


Jeff Zhang

On Wed, Dec 23, 2009 at 7:31 PM, Something Something <
mailinglists19@gmail.com> wrote:

> I think you meant.. KeyValueTextInputFormat.  This is in a deprecated
> package.  It even uses JobConf that's been deprecated.  Is there an
> equivalent new class that's not deprecated?  Otherwise, I will have to
> create the JobConf object just to use this class.
>
> In any case, I tried using it as follows (also a few other variations...)
>
>    JobConf jobConf = new JobConf();
>    jobConf.setWorkingDirectory(new Path("myapp/output"));
>    KeyValueTextInputFormat.addInputPath(jobConf, new
> Path("./part-r-00000"));
>
> But it keeps throwing this exception...
>
> Exception in thread "main" java.io.IOException: No input paths specified in
> job
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186)
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>
>
> What's the right way to use this class?  Thanks for your help.
>
>
> On Wed, Dec 23, 2009 at 6:16 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
>
> > It seems the value type of your first job's output is Text, but I guess
> > your
> > second job's InputFormat is TextInputFormat, the key type of
> > TextInputFormat
> > is LongWritable. So you will get the Type mismatch error message. I
> suggest
> > you use KeyValueInputFormat as your second job's InputFormat.
> >
> >
> > Jeff Zhang
> >
> >
> > On Wed, Dec 23, 2009 at 4:22 PM, Something Something <
> > mailinglists19@gmail.com> wrote:
> >
> > > I would like to feed a file created by one job as an input to the next
> > job.
> > >  When I do that, I get:
> > >
> > > java.io.IOException: Type mismatch in key from map: expected
> > > org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> > >  at
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> > > at
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
> > >  at
> > >
> > >
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
> > >  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> > >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >
> > >
> > > The first job does:  context.write(key, value) - in a loop.  This
> creates
> > a
> > > file (<output dir>/part-r-00000) that contains something like this...
> > >
> > > 1 1,2,4*6*,1**
> > > 1 2,2,6*,4**
> > > 2 1,6,2*3*5*6*7*8*,1**
> > > 2 2,6,3*5*6*7*8*,2**
> > > & so on...
> > >
> > > Now in my second job I do:
> > >
> > > FileInputFormat.addInputPath(job, new Path(inFile));
> > >
> > > Where inFile is set to the one created above (<output
> dir>/part-r-00000)
> > >
> > >
> > > What am I doing wrong?  Please help.  Thanks.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message