hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: which is better Text or Custom Class
Date Thu, 23 Apr 2009 08:43:16 GMT
In general, serializing to text and then parsing back into a different
format will always be slower than using a purpose-built class that can
serialize itself. The tradeoff, of course, is that going to text is often
more convenient from a developer-time perspective.

- Aaron

On Mon, Apr 20, 2009 at 2:23 PM, chintan bhatt <chin10_5@hotmail.com> wrote:

> Hi all,
> I want to ask you about the performance difference between using the Text
> class and using a custom Class which implements  Writable interface.
> Lets say in InvertedIndex problem when I emit token and a list of document
> Ids which contains it  , using Text we usually Concat the list of document
> ids with space as a separator  "d1 d2 d3 d4" etc..If I need the same values
> in a later step of map reduce, I need to split the value string to get the
> list of all document Ids. Is it not better to use Writable List instead??
> I need to ask it because I am using too many Concats and Splits in my
> project to use documents total tokens count, token frequency in a particular
> document etc..
> Thanks in advance,
> Chintan
> _________________________________________________________________
> Windows Live Messenger. Multitasking at its finest.
> http://www.microsoft.com/india/windows/windowslive/messenger.aspx

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message