Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 1183 invoked from network); 23 Apr 2009 08:44:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Apr 2009 08:44:09 -0000 Received: (qmail 41062 invoked by uid 500); 23 Apr 2009 08:44:07 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 40975 invoked by uid 500); 23 Apr 2009 08:44:07 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 40965 invoked by uid 99); 23 Apr 2009 08:44:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2009 08:44:07 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.132.244] (HELO an-out-0708.google.com) (209.85.132.244) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2009 08:43:59 +0000 Received: by an-out-0708.google.com with SMTP id c2so265364anc.29 for ; Thu, 23 Apr 2009 01:43:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.96.10 with SMTP id t10mr1109496anb.70.1240476211339; Thu, 23 Apr 2009 01:43:31 -0700 (PDT) In-Reply-To: References: From: Aaron Kimball Date: Thu, 23 Apr 2009 17:43:16 +0900 Message-ID: Subject: Re: which is better Text or Custom Class To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6461678ee1007046834dfb6 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6461678ee1007046834dfb6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit In general, serializing to text and then parsing back into a different format will always be slower than using a purpose-built class that can serialize itself. The tradeoff, of course, is that going to text is often more convenient from a developer-time perspective. - Aaron On Mon, Apr 20, 2009 at 2:23 PM, chintan bhatt wrote: > > Hi all, > I want to ask you about the performance difference between using the Text > class and using a custom Class which implements Writable interface. > > Lets say in InvertedIndex problem when I emit token and a list of document > Ids which contains it , using Text we usually Concat the list of document > ids with space as a separator "d1 d2 d3 d4" etc..If I need the same values > in a later step of map reduce, I need to split the value string to get the > list of all document Ids. Is it not better to use Writable List instead?? > > I need to ask it because I am using too many Concats and Splits in my > project to use documents total tokens count, token frequency in a particular > document etc.. > > > Thanks in advance, > Chintan > > > _________________________________________________________________ > Windows Live Messenger. Multitasking at its finest. > http://www.microsoft.com/india/windows/windowslive/messenger.aspx --0016e6461678ee1007046834dfb6--