Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 7074 invoked from network); 10 Oct 2007 17:17:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Oct 2007 17:17:48 -0000 Received: (qmail 44881 invoked by uid 500); 10 Oct 2007 16:43:30 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 44857 invoked by uid 500); 10 Oct 2007 16:43:30 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 44847 invoked by uid 99); 10 Oct 2007 16:43:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2007 09:43:29 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2007 16:43:41 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4B120714240 for ; Wed, 10 Oct 2007 09:42:51 -0700 (PDT) Message-ID: <3538191.1192034571303.JavaMail.jira@brutus> Date: Wed, 10 Oct 2007 09:42:51 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce In-Reply-To: <8648883.1191358670825.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533789 ] owen.omalley edited comment on HADOOP-1986 at 10/10/07 9:42 AM: ----------------------------------------------------------------- *laugh* Option 1 is *precisely* what I was proposing, except that you keep blurring how this part works: {quote} we simply obtain the right serializer object (RecordIOSerializer or ThriftSerializer or whatever) using a factory or through a configuration file {quote} My proposal spells out how that happens by saying that the configuration has a map between root classes and the corresponding serializer class. When the factory is given an object, it consults the map and constructs the correct serializer. Clearly the factory will cache the information so that it doesn't have to traverse the class hierarchy for constructing each serializer. It just simplifies things a bit if the serializer class specifies which class they work on instead of configuring a value like: {code} org.apache.hadoop.io.Writable->org.apache.hadoop.io.WritableSeraizlier {code} we can just provide a list of serializers the factory can automatically determine what they apply to. Now that I think about it, just having {code} interface Serializer { void serialize(T t, OutputStream out) throws IOException; void deserialize(T t, InputStream in) } {code} because we can use reflection to find the value of T for a given factory class. So, the serializers would just look like: {code} class ThriftSerializer implements Serializer { void serialize(ThriftRecord t, OutputStream out) throws IOException {...} void deserialize(ThriftRecord t, InputStream in) throws IOException {...} } class WritableSerializer implements Serializer { void serialize(Writable t, OutputStream out) throws IOException {...} void deserialize(Writable t, InputStream in) throws IOException {...} } {code} and in the config put: {code} hadoop.serializers org.apache.hadoop.io.WritableSerializer,com.facebook.hadoop.ThriftSerializer The list of serializers available to Hadoop {code} was (Author: owen.omalley): *laugh* Option 1 is *precisely* what I was proposing, except that you keep blurring how this part works: {quote} we simply obtain the right serializer object (RecordIOSerializer or ThriftSerializer or whatever) using a factory or through a configuration file {quote} My proposal spells out how that happens by saying that the configuration has a map between root classes and the corresponding serializer class. When the factory is given an object, it consults the map and constructs the correct serializer. Clearly the factory will cache the information so that it doesn't have to traverse the class hierarchy for constructing each serializer. It just simplifies things a bit if the serializer class specifies which class they work on instead so that instead of configuring a value like: {code} org.apache.hadoop.io.Writable->org.apache.hadoop.io.WritableSeraizlier {code} we can just provide a list of serializers the factory can automatically determine what they apply to. Now that I think about it, just having {code} interface Serializer { void serialize(T t, OutputStream out) throws IOException; void deserialize(T t, InputStream in) } {code} because we can use reflection to find the value of T for a given factory class. So, the serializers would just look like: {code} class ThriftSerializer implements Serializer { void serialize(ThriftRecord t, OutputStream out) throws IOException {...} void deserialize(ThriftRecord t, InputStream in) throws IOException {...} } class WritableSerializer implements Serializer { void serialize(Writable t, OutputStream out) throws IOException {...} void deserialize(Writable t, InputStream in) throws IOException {...} } {code} and in the config put: {code} hadoop.serializers org.apache.hadoop.io.WritableSerializer,com.facebook.hadoop.ThriftSerializer The list of serializers available to Hadoop {code} > Add support for a general serialization mechanism for Map Reduce > ---------------------------------------------------------------- > > Key: HADOOP-1986 > URL: https://issues.apache.org/jira/browse/HADOOP-1986 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Tom White > Assignee: Tom White > Fix For: 0.16.0 > > Attachments: SerializableWritable.java > > > Currently Map Reduce programs have to use WritableComparable-Writable key-value pairs. While it's possible to write Writable wrappers for other serialization frameworks (such as Thrift), this is not very convenient: it would be nicer to be able to use arbitrary types directly, without explicit wrapping and unwrapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.