Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 5759 invoked from network); 10 Jun 2008 20:02:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jun 2008 20:02:17 -0000 Received: (qmail 26384 invoked by uid 500); 10 Jun 2008 20:02:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 26350 invoked by uid 500); 10 Jun 2008 20:02:16 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 26261 invoked by uid 99); 10 Jun 2008 20:02:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2008 13:02:16 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2008 20:01:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0508E234C135 for ; Tue, 10 Jun 2008 13:01:45 -0700 (PDT) Message-ID: <293006806.1213128105019.JavaMail.jira@brutus> Date: Tue, 10 Jun 2008 13:01:45 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3522) ValuesIterator.next() doesn't return a new object, thus failing many equals() tests. In-Reply-To: <1009969547.1213063305104.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-3522: ---------------------------------- Attachment: reduce-doc-19.patch The version for 0.18 and 0.19. > ValuesIterator.next() doesn't return a new object, thus failing many equals() tests. > ------------------------------------------------------------------------------------ > > Key: HADOOP-3522 > URL: https://issues.apache.org/jira/browse/HADOOP-3522 > Project: Hadoop Core > Issue Type: Bug > Affects Versions: 0.17.0 > Reporter: Spyros Blanas > Assignee: Owen O'Malley > Attachments: reduce-doc-19.patch, reduce-doc.patch > > > h2.Problem > The ValuesIterator.next() doesn't return a new object representing the next element. > It is expected that this is the case. Example from java.lang.String source code: > {noformat} > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > // custom code here > } > } > {noformat} > Changing the private fields of the object and returning it from next() will make this object fail the equals() test, because this==anObject will always be true. > h2. What is affected > The reduce() method presents (a subclass of) ValuesIterator to the user. So every user-defined class can be affected. > Manifestations of this bug in 0.17.0: > * The Text class checks for equality similarly to String.equals(), which is shown above. > * The contrib/data_join breaks because it stores tags in a Map. The behavior of the next() method makes Object.equals() be true for all tags. > h2. JUnit test > Patch against hadoop-0.17.0: > {noformat} > Index: src/test/org/apache/hadoop/mapred/TestReduceTask.java > =================================================================== > --- src/test/org/apache/hadoop/mapred/TestReduceTask.java (revision 665935) > +++ src/test/org/apache/hadoop/mapred/TestReduceTask.java (working copy) > @@ -122,4 +122,52 @@ > runValueIterator(tmpDir, testCase, conf); > } > } > + > + public void runValueIterator2(Path tmpDir, Pair[] vals, > + Configuration conf) throws IOException { > + FileSystem fs = tmpDir.getFileSystem(conf); > + Path path = new Path(tmpDir, "data.in"); > + SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, > + Text.class, > + Text.class); > + for(Pair p: vals) { > + writer.append(new Text(p.key), new Text(p.value)); > + } > + writer.close(); > + SequenceFile.Sorter sorter = new SequenceFile.Sorter(fs, Text.class, > + Text.class, conf); > + SequenceFile.Sorter.RawKeyValueIterator rawItr = > + sorter.merge(new Path[]{path}, false, tmpDir); > + @SuppressWarnings("unchecked") // WritableComparators are not generic > + ReduceTask.ValuesIterator valItr = > + new ReduceTask.ValuesIterator(rawItr, > + WritableComparator.get(Text.class), Text.class, Text.class, > + conf, new NullProgress()); > + while (valItr.more()) { > + Object key = valItr.getKey(); > + String keyString = key.toString(); > + // make sure it matches! > + assertEquals(vals[0].key, keyString); > + // must have at least 1 value! > + assertTrue(valItr.hasNext()); > + Text value1 = valItr.next(); > + // must have at least 2 values! > + assertTrue(valItr.hasNext()); > + Text value2 = valItr.next(); > + // do test > + assertNotSame(value1, value2); > + assertTrue(!value1.equals(value2)); > + assertTrue(!value2.equals(value1)); > + // make sure the key hasn't changed under the hood > + assertEquals(keyString, valItr.getKey().toString()); > + valItr.nextKey(); > + } > + } > + > + public void testValueIterator2() throws Exception { > + Path tmpDir = new Path("build/test/test.reduce.task"); > + Configuration conf = new Configuration(); > + Pair[] test = new Pair[]{ new Pair("k1", "v1"), new Pair("k1", "v2") }; > + runValueIterator2(tmpDir, test, conf); > + } > } > {noformat} > h2. A program with a bug > I would imagine that a programmer would be really confused when everything is equal in the example below, for any text input: > {noformat} > package test; > import java.io.*; > import java.util.*; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.conf.*; > import org.apache.hadoop.io.*; > import org.apache.hadoop.util.*; > import org.apache.hadoop.mapred.*; > import org.apache.hadoop.mapred.lib.*; > public class TestNewReducer extends Configured implements Tool { > public int run(String[] args) throws Exception { > if (args.length < 2) { > System.out.println("TestNewReducer "); > return -1; > } > JobConf conf = new JobConf(getConf(), TestNewReducer.class); > conf.setJobName("test-newreducer"); > conf.setInputFormat(TextInputFormat.class); > conf.setMapperClass(Map.class); > conf.setReducerClass(Reduce.class); > conf.setOutputFormat(TextOutputFormat.class); > conf.setOutputKeyClass(LongWritable.class); > conf.setOutputValueClass(Text.class); > FileInputFormat.setInputPaths(conf, new Path(args[0])); > FileOutputFormat.setOutputPath(conf, new Path(args[1])); > JobClient.runJob(conf); > return 0; > } > public static void main(String[] args) throws Exception { > int res = ToolRunner.run(new Configuration(), new TestNewReducer(), args); > System.exit(res); > } > public static class Map extends MapReduceBase implements Mapper { > private final static LongWritable nill = new LongWritable(1); > public void map (LongWritable key, Text value, > OutputCollector output, > Reporter reporter) throws IOException { > output.collect(nill, value); > } > } > public static class Reduce extends MapReduceBase implements Reducer { > private final static LongWritable nill = new LongWritable(1); > public void reduce(LongWritable key, Iterator values, > OutputCollector output, > Reporter reporter) throws IOException { > Text t1 = null; > if (values.hasNext()) > t1 = values.next(); // t1 is the first element in values > while(values.hasNext()) { > // is this element equal to t1? > Text t = values.next(); > output.collect(nill, t.equals(t1) ? new Text("Equal") : new Text("Not equal")); > } > } > } > } > {noformat} > h2. Possible fixes > # Return a new object each time for next(). This might have significant overhead. > # Return a new object for unknown object types and reuse the same object for known types (like Text). Remove "if (this==anObject) return true;" check from all equals() methods for known objects. > # Document clearly that all user-defined classes must implement an equals() method, which doesn't do the "if (this==anObject) return true;" check (ie. push the problem to the user). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.