hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Type mismatch in key from map
Date Thu, 31 Dec 2009 19:18:03 GMT
I tried using KeyValueTextInputFormat from the 'trunk' but I am getting the
same error message (Type mismatch in key from map: expected
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable)

Not sure what you mean by "use part-00000".  There's no file called
part-00000.

Anyway, for now I will just workaround this issue.  New plan is this....
 the first job will write to a new table in HBase and the 2nd job will use
TableMapper to go thru each row.  I was trying to avoid this because
something tells me that this might be a bit slower than just using HDFS
directly, but we shall see.  Thanks.



On Wed, Dec 23, 2009 at 8:01 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> The KeyValueTextInputFormat is deprecated in hadoop 0.20.1, and there's a
> new one in trunk for new api.
> I think you should use part-00000 rather than part-r-00000, then you will
> get no IOException, there's output file name change between different
> hadoop
> versions.
>
>
> Jeff Zhang
>
> On Wed, Dec 23, 2009 at 7:31 PM, Something Something <
> mailinglists19@gmail.com> wrote:
>
> > I think you meant.. KeyValueTextInputFormat.  This is in a deprecated
> > package.  It even uses JobConf that's been deprecated.  Is there an
> > equivalent new class that's not deprecated?  Otherwise, I will have to
> > create the JobConf object just to use this class.
> >
> > In any case, I tried using it as follows (also a few other variations...)
> >
> >    JobConf jobConf = new JobConf();
> >    jobConf.setWorkingDirectory(new Path("myapp/output"));
> >    KeyValueTextInputFormat.addInputPath(jobConf, new
> > Path("./part-r-00000"));
> >
> > But it keeps throwing this exception...
> >
> > Exception in thread "main" java.io.IOException: No input paths specified
> in
> > job
> > at
> >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186)
> > at
> >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> >
> >
> > What's the right way to use this class?  Thanks for your help.
> >
> >
> > On Wed, Dec 23, 2009 at 6:16 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
> >
> > > It seems the value type of your first job's output is Text, but I guess
> > > your
> > > second job's InputFormat is TextInputFormat, the key type of
> > > TextInputFormat
> > > is LongWritable. So you will get the Type mismatch error message. I
> > suggest
> > > you use KeyValueInputFormat as your second job's InputFormat.
> > >
> > >
> > > Jeff Zhang
> > >
> > >
> > > On Wed, Dec 23, 2009 at 4:22 PM, Something Something <
> > > mailinglists19@gmail.com> wrote:
> > >
> > > > I would like to feed a file created by one job as an input to the
> next
> > > job.
> > > >  When I do that, I get:
> > > >
> > > > java.io.IOException: Type mismatch in key from map: expected
> > > > org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
> > > >  at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
> > > >  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> > > >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > > The first job does:  context.write(key, value) - in a loop.  This
> > creates
> > > a
> > > > file (<output dir>/part-r-00000) that contains something like this...
> > > >
> > > > 1 1,2,4*6*,1**
> > > > 1 2,2,6*,4**
> > > > 2 1,6,2*3*5*6*7*8*,1**
> > > > 2 2,6,3*5*6*7*8*,2**
> > > > & so on...
> > > >
> > > > Now in my second job I do:
> > > >
> > > > FileInputFormat.addInputPath(job, new Path(inFile));
> > > >
> > > > Where inFile is set to the one created above (<output
> > dir>/part-r-00000)
> > > >
> > > >
> > > > What am I doing wrong?  Please help.  Thanks.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message