Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 50070 invoked from network); 10 Nov 2009 11:20:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Nov 2009 11:20:52 -0000 Received: (qmail 10163 invoked by uid 500); 10 Nov 2009 11:20:51 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 10123 invoked by uid 500); 10 Nov 2009 11:20:51 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 10113 invoked by uid 99); 10 Nov 2009 11:20:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 11:20:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 11:20:49 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DE730234C046 for ; Tue, 10 Nov 2009 03:20:27 -0800 (PST) Message-ID: <1691046697.1257852027889.JavaMail.jira@brutus> Date: Tue, 10 Nov 2009 11:20:27 +0000 (UTC) From: "Lars George (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Created: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework ---------------------------------------------------------------------------------------- Key: HBASE-1969 URL: https://issues.apache.org/jira/browse/HBASE-1969 Project: Hadoop HBase Issue Type: Bug Affects Versions: 0.20.1 Reporter: Lars George The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete instances to the TableOutputFormat. So the explicit Put reference was changed to Writable in the process. But that does not work as expected: {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : attempt_200911031030_0004_m_000013_2, Status : FAILED java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140) at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code} The issue is that the MapReduce framework checks not polymorphic for the type using "instanceof" but with a direct class comparison. In MapTask.java you find this code {code} public synchronized void collect(K key, V value, int partition ) throws IOException { reporter.progress(); if (key.getClass() != keyClass) { throw new IOException("Type mismatch in key from map: expected " + keyClass.getName() + ", recieved " + key.getClass().getName()); } if (value.getClass() != valClass) { throw new IOException("Type mismatch in value from map: expected " + valClass.getName() + ", recieved " + value.getClass().getName()); } ... {code} So it does not work using a Writable as the MapOutputValueClass for the job and then hand in a Put or Delete! The test case TestMapReduce did not pick this up as it has this line in it {code} TableMapReduceUtil.initTableMapperJob( Bytes.toString(table.getTableName()), scan, ProcessContentsMapper.class, ImmutableBytesWritable.class, Put.class, job);{code} which sets the value class to Put {code}if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);{code} To fix this (for now) one can set the class to Put the same way or explicitly in their code {code}job.setMapOutputValueClass(Put.class);{code} But the whole idea only seems feasable if a) the Hadoop class is amended to use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a combined class that represent a Put *and* a Delete - which seems somewhat wrong, but doable. It would only really find use in that context and would require the user to make use of it when calling context.write(). This is making things not easier to learn. Suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.