Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 26117 invoked from network); 21 Nov 2007 03:25:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Nov 2007 03:25:16 -0000 Received: (qmail 90970 invoked by uid 500); 21 Nov 2007 03:24:55 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 90942 invoked by uid 500); 21 Nov 2007 03:24:54 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 90933 invoked by uid 99); 21 Nov 2007 03:24:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 19:24:54 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2007 03:25:04 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0AD4D714201 for ; Tue, 20 Nov 2007 19:24:43 -0800 (PST) Message-ID: <18690746.1195615483022.JavaMail.jira@brutus> Date: Tue, 20 Nov 2007 19:24:43 -0800 (PST) From: "Edward Yoon (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-2234) [hbase] TableInputFormat erroneously aggregates map values In-Reply-To: <14571922.1195544563166.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544127 ] udanax edited comment on HADOOP-2234 at 11/20/07 7:24 PM: --------------------------------------------------------------- Michael: I just tryed either A and B for more comprehension. I think there was no difference. And, I didn't figure out what was wrong yet. {code} A : public class TestMap extends TableMap { .. public void map(HStoreKey key, MapWritable value, TableOutputCollector output, Reporter reporter) throws IOException { output.collect(key.getRow(), value); } } B: public class TestMap extends MapReduceBase Implements Mapper { .. public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { .. output.collect((HStoreKey)key.getRow(), (MapWritable)value); } .. } {code} was (Author: udanax): Michael: I tryed either TableMap and MapReduceBase + Mapper. please ignore it. > [hbase] TableInputFormat erroneously aggregates map values > ---------------------------------------------------------- > > Key: HADOOP-2234 > URL: https://issues.apache.org/jira/browse/HADOOP-2234 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: stack > Priority: Minor > > Edward Yoon reports the following phenomeon: > Given a table: > {code} > [21:38] row1 a: b: a:ca > [21:38] row2 a: b: > [21:38] row3 a: b: > {code} > This map code: > {code} > public void map(WritableComparable key, Writable value, > OutputCollector output, Reporter reporter) throws IOException { > if (m_collector.collector == null) { > m_collector.collector = output; > } > HStoreKey hKey = (HStoreKey) key; > MapWritable newValue = (MapWritable) value; > newValue.put(new Text("row:" + hKey.getRow().toString()), new ImmutableBytesWritable(hKey.getRow().toString().getBytes())); > > Map log = new HashMap(); > for(Map.Entry e : newValue.entrySet()) { > log.put(e.getKey(), e.getValue()); //abbreviation code. > } > > LOG.info(log); > output.collect(hKey, newValue); > } > {code} > ... produces the following. > {code} > 07/11/20 14:07:53 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= > 07/11/20 14:07:53 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). > 07/11/20 14:07:53 INFO mapred.MapTask: numReduceTasks: 1 > 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa, b:=bb, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa3, b:=bb3, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa4, b:=bb4, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO mapred.LocalJobRunner: > 07/11/20 14:07:53 INFO mapred.TaskRunner: Task 'map_0000' done. > 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa, b:=bb, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa3, b:=bb3, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa4, b:=bb4, a:da=aa44, a:ca=aa2} > 07/11/20 14:07:53 INFO mapred.LocalJobRunner: reduce > reduce > 07/11/20 14:07:53 INFO mapred.TaskRunner: Task 'reduce_9ji2mr' done. > {code} > Notice how content from the first row is present when you output the second and third rows. > The problem is that in TIF, after calling scanner.next, it copies the scanner.next value to the passed in MapWritable value (converting from TreeMap to MapWritable). It resets the TreeMap passed to the scanner.next each time but not the passed in MapWritable. > There is a similar problem in the reduce where the outputter is collecting together values (see log above). Need to figure whats going on here. Below is the reduce code: > {code} > [22:03] while (values.hasNext()) { > [22:03] MapWritable data = (MapWritable) values.next(); > [22:03] Map log = new HashMap(); > [22:03] for (Map.Entry e : data.entrySet()) { > [22:03] log.put(e.getKey().toString(), new String(((ImmutableBytesWritable) e > [22:03] .getValue()).get())); > [22:03] } > [22:03] LOG.info(log); > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.