Return-Path: Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: (qmail 8397 invoked from network); 27 Jan 2010 03:25:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jan 2010 03:25:06 -0000 Received: (qmail 97878 invoked by uid 500); 27 Jan 2010 03:25:05 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 97823 invoked by uid 500); 27 Jan 2010 03:25:05 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 97813 invoked by uid 99); 27 Jan 2010 03:25:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2010 03:25:05 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2010 03:24:56 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9C8D4234C48C for ; Tue, 26 Jan 2010 19:24:34 -0800 (PST) Message-ID: <2044772312.55881264562674640.JavaMail.jira@brutus.apache.org> Date: Wed, 27 Jan 2010 03:24:34 +0000 (UTC) From: "robert Cook (JIRA)" To: common-dev@hadoop.apache.org Subject: [jira] Created: (HADOOP-6513) SequenceFile.Sorter design issue and class-check bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org SequenceFile.Sorter design issue and class-check bug ----------------------------------------------------- Key: HADOOP-6513 URL: https://issues.apache.org/jira/browse/HADOOP-6513 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: hadoop 20.1, java 1.6.0_17,fedora Reporter: robert Cook SequenceFile.Writer takes key/value classes as creation arguments and checks for validity on every append. Reader does not take class arguments on creation because they are derived from the input file. Sorter takes key/value classes as creation arguments?? no point. should be derived from input. In any case, SortPass does not compare Sorter key/value classes with input file classes. No error is given for the following: private static void writeTest4(FileSystem fs, int count, int seed, Path file, SequenceFile.CompressionType compressionType, CompressionCodec codec, Configuration conf) throws IOException { fs.delete(file, true); LOG.info("creating " + count + " records with " + compressionType + " compression"); SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, file, StringWritable.class, FloatWritable.class, compressionType, codec); FloatWritable x=new FloatWritable(); StringWritable y=new StringWritable(); for (int i = count-1; i >= 0; i--) { x.set(i); y.set(""+i); writer.append(y, x); } writer.close(); } private static void sortTest(FileSystem fs, int count, int megabytes, int factor, boolean fast, Path file, Configuration conf) throws IOException { fs.delete(new Path(file+".sorted"), true); SequenceFile.Sorter sorter = newSorter(fs, fast, megabytes, factor, conf); LOG.debug("sorting " + count + " records"); sorter.sort(file, file.suffix(".sorted")); LOG.info("done sorting " + count + " debug"); } private static SequenceFile.Sorter newSorter(FileSystem fs, boolean fast, int megabytes, int factor, Configuration conf) { SequenceFile.Sorter sorter = fast ? new SequenceFile.Sorter(fs, new IntWritable.Comparator(), FloatWritable.class, IntWritable.class, conf) : new SequenceFile.Sorter(fs, FloatWritable.class, IntWritable.class, conf); sorter.setMemory(megabytes * 1024*1024); sorter.setFactor(factor); return sorter; } ---------------------Note String/Float does not match Float/Int Macintosh-2:datanode bobcook$ od -c file 0000000 S E Q 006 016 S t r i n g W r i t a 0000020 b l e \r F l o a t W r i t a b l 0000040 e \0 \0 \0 \0 \0 \0 203 ` n E J z 272 d 352 0000060 w 177 373 \n 364 M 276 \0 \0 \0 \n \0 \0 \0 006 \0 0000100 \0 \0 001 \0 4 @ 200 \0 \0 \0 \0 \0 \n \0 \0 \0 0000120 006 \0 \0 \0 001 \0 3 @ @ \0 \0 \0 \0 \0 \n \0 0000140 \0 \0 006 \0 \0 \0 001 \0 2 @ \0 \0 \0 \0 \0 \0 0000160 \n \0 \0 \0 006 \0 \0 \0 001 \0 1 ? 200 \0 \0 \0 0000200 \0 \0 \n \0 \0 \0 006 \0 \0 \0 001 \0 0 \0 \0 \0 * 0000220 Macintosh-2:datanode bobcook$ od -c file.sorted 0000000 S E Q 006 \r F l o a t W r i t a b 0000020 l e \v I n t W r i t a b l e \0 \0 0000040 \0 \0 \0 \0 6 364 343 \r 256 h U 222 365 T 7 l 0000060 357 i ~ } \0 \0 \0 \n \0 \0 \0 006 \0 \0 \0 001 0000100 \0 4 @ 200 \0 \0 \0 \0 \0 \n \0 \0 \0 006 \0 \0 0000120 \0 001 \0 3 @ @ \0 \0 \0 \0 \0 \n \0 \0 \0 006 0000140 \0 \0 \0 001 \0 2 @ \0 \0 \0 \0 \0 \0 \n \0 \0 0000160 \0 006 \0 \0 \0 001 \0 1 ? 200 \0 \0 \0 \0 \0 \n 0000200 \0 \0 \0 006 \0 \0 \0 001 \0 0 \0 \0 \0 \0 0000216 NOTE OUTPUT FILE IS TOTALLY TOASTED, but no error was generated! PS: Your evaluation of my previous bug reports was enlightening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.