From general-return-856-apmail-hadoop-general-archive=hadoop.apache.org@hadoop.apache.org Thu Dec 31 19:18:34 2009 Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 18405 invoked from network); 31 Dec 2009 19:18:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Dec 2009 19:18:34 -0000 Received: (qmail 19310 invoked by uid 500); 31 Dec 2009 19:18:33 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 19226 invoked by uid 500); 31 Dec 2009 19:18:32 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 19216 invoked by uid 99); 31 Dec 2009 19:18:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Dec 2009 19:18:32 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mailinglists19@gmail.com designates 74.125.78.26 as permitted sender) Received: from [74.125.78.26] (HELO ey-out-2122.google.com) (74.125.78.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Dec 2009 19:18:25 +0000 Received: by ey-out-2122.google.com with SMTP id 25so1749807eya.23 for ; Thu, 31 Dec 2009 11:18:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=3Z3Gjq/FYnCom1vI5IJJz7jiFiPzO+e+FcVXdcuLlXo=; b=btJqjgnYoRxztbEDhKqDiPABhCkrpfQV9K3KoSf4CHPzahaJ3194xuW4L0aU/da4Gw fvhnj80NP8ANBN5wa5IhzrQRCdyfYupuYgze9+aOupUsyIcsRXdvjs/PscBNziZYYdyC WKuq+DoJ0Y3FFCLP0+TLgmGEEmUd+NZbzgq/E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=holbw8H6SoS8s2FuP4ySXpivvQahjcz1jmFFvIxEPJVnjX6LH/OBizBJJlji/veUsn vERLD/1R4Iut+uYlSQd7+eepcpu16qqu4aqzxdaeK984RjdjwSyM38EWjUUtw1RYyT3y TzjJYm2D/Ekyt/iTMELMC0taoYuUxlm/D0zQ0= MIME-Version: 1.0 Received: by 10.216.85.7 with SMTP id t7mr128618wee.122.1262287084025; Thu, 31 Dec 2009 11:18:04 -0800 (PST) In-Reply-To: <8211a1320912232001s39bd8bc1t920358d6dd9c439b@mail.gmail.com> References: <1eabbac30912231622v3c40e2b6kbf1de21d7755b67b@mail.gmail.com> <8211a1320912231816s58cb3f94l6b2927e82cb48cd4@mail.gmail.com> <1eabbac30912231931m19ef267bk7f5442637f9b6fe2@mail.gmail.com> <8211a1320912232001s39bd8bc1t920358d6dd9c439b@mail.gmail.com> Date: Thu, 31 Dec 2009 11:18:03 -0800 Message-ID: <1eabbac30912311118n484c1e73xf6139dace6f3ee27@mail.gmail.com> Subject: Re: Type mismatch in key from map From: Something Something To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6d77cce3f7a82047c0b1d16 --0016e6d77cce3f7a82047c0b1d16 Content-Type: text/plain; charset=ISO-8859-1 I tried using KeyValueTextInputFormat from the 'trunk' but I am getting the same error message (Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable) Not sure what you mean by "use part-00000". There's no file called part-00000. Anyway, for now I will just workaround this issue. New plan is this.... the first job will write to a new table in HBase and the 2nd job will use TableMapper to go thru each row. I was trying to avoid this because something tells me that this might be a bit slower than just using HDFS directly, but we shall see. Thanks. On Wed, Dec 23, 2009 at 8:01 PM, Jeff Zhang wrote: > The KeyValueTextInputFormat is deprecated in hadoop 0.20.1, and there's a > new one in trunk for new api. > I think you should use part-00000 rather than part-r-00000, then you will > get no IOException, there's output file name change between different > hadoop > versions. > > > Jeff Zhang > > On Wed, Dec 23, 2009 at 7:31 PM, Something Something < > mailinglists19@gmail.com> wrote: > > > I think you meant.. KeyValueTextInputFormat. This is in a deprecated > > package. It even uses JobConf that's been deprecated. Is there an > > equivalent new class that's not deprecated? Otherwise, I will have to > > create the JobConf object just to use this class. > > > > In any case, I tried using it as follows (also a few other variations...) > > > > JobConf jobConf = new JobConf(); > > jobConf.setWorkingDirectory(new Path("myapp/output")); > > KeyValueTextInputFormat.addInputPath(jobConf, new > > Path("./part-r-00000")); > > > > But it keeps throwing this exception... > > > > Exception in thread "main" java.io.IOException: No input paths specified > in > > job > > at > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186) > > at > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) > > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) > > at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) > > > > > > What's the right way to use this class? Thanks for your help. > > > > > > On Wed, Dec 23, 2009 at 6:16 PM, Jeff Zhang wrote: > > > > > It seems the value type of your first job's output is Text, but I guess > > > your > > > second job's InputFormat is TextInputFormat, the key type of > > > TextInputFormat > > > is LongWritable. So you will get the Type mismatch error message. I > > suggest > > > you use KeyValueInputFormat as your second job's InputFormat. > > > > > > > > > Jeff Zhang > > > > > > > > > On Wed, Dec 23, 2009 at 4:22 PM, Something Something < > > > mailinglists19@gmail.com> wrote: > > > > > > > I would like to feed a file created by one job as an input to the > next > > > job. > > > > When I do that, I get: > > > > > > > > java.io.IOException: Type mismatch in key from map: expected > > > > org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable > > > > at > > > > > > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807) > > > > at > > > > > > > > > > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > > > > > > > > > The first job does: context.write(key, value) - in a loop. This > > creates > > > a > > > > file (/part-r-00000) that contains something like this... > > > > > > > > 1 1,2,4*6*,1** > > > > 1 2,2,6*,4** > > > > 2 1,6,2*3*5*6*7*8*,1** > > > > 2 2,6,3*5*6*7*8*,2** > > > > & so on... > > > > > > > > Now in my second job I do: > > > > > > > > FileInputFormat.addInputPath(job, new Path(inFile)); > > > > > > > > Where inFile is set to the one created above ( > dir>/part-r-00000) > > > > > > > > > > > > What am I doing wrong? Please help. Thanks. > > > > > > > > > > --0016e6d77cce3f7a82047c0b1d16--