Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 95897 invoked from network); 28 Apr 2010 17:22:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 17:22:12 -0000 Received: (qmail 70583 invoked by uid 500); 28 Apr 2010 17:22:12 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 70553 invoked by uid 500); 28 Apr 2010 17:22:12 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 70545 invoked by uid 99); 28 Apr 2010 17:22:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 17:22:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 17:22:10 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3SHLmrs015332 for ; Wed, 28 Apr 2010 17:21:48 GMT Message-ID: <21490757.631272475308647.JavaMail.jira@thor> Date: Wed, 28 Apr 2010 13:21:48 -0400 (EDT) From: "Tom White (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6729) serializer.JavaSerialization should be added to io.serializations by default In-Reply-To: <18861639.48801272422432910.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861847#action_12861847 ] Tom White commented on HADOOP-6729: ----------------------------------- One inefficiency of JavaSerialization is the fact that it stores the classname with every record. This is actually worse than normal Java serialization, which uses backreferences to classnames to make the resulting stream more compact. This optimization is disabled in Hadoop (see JavaSerializationSerializer#serialize()) because records are reordered in the shuffle, which would break back references. Another inefficiency is that JavaSerialization creates a new object every time the deserialize() is called. In the context of large scale data processing, where there may be billions of records, this is very expensive, which is why Writables and Avro reuse instances. > serializer.JavaSerialization should be added to io.serializations by default > ---------------------------------------------------------------------------- > > Key: HADOOP-6729 > URL: https://issues.apache.org/jira/browse/HADOOP-6729 > Project: Hadoop Common > Issue Type: Improvement > Components: conf > Affects Versions: 0.20.2 > Reporter: Ted Yu > > org.apache.hadoop.io.serializer.JavaSerialization isn't included in io.serializations by default. > When a class which implements the Serializable interface is used, user would see the following without serializer.JavaSerialization: > java.lang.NullPointerException > at > org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:759) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:487) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.