hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind A Bhandarkar" <mili...@yahoo-inc.com>
Subject Re: sort speeds under java, c++, and streaming
Date Fri, 09 Nov 2007 03:45:36 GMT
One more thing about your original numbers.

Are they repeatable ?

- milind

----- Original Message -----
From: Owen O'Malley <oom@yahoo-inc.com>
To: hadoop-user@lucene.apache.org <hadoop-user@lucene.apache.org>
Sent: Thu Nov 08 19:10:30 2007
Subject: Re: sort speeds under java, c++, and streaming

On Nov 8, 2007, at 5:14 PM, Milind A Bhandarkar wrote:

> Does pipes deserializes and serializes data for the identity  
> mappers or just "passes it through" ? (Streaming converts input to  
> text, afaik)

Pipes serializes the objects to bytes and sends them to the C++  
program. The C++ program gets them as C++ strings, which are  
effectively byte arrays. Pipes does not do the conversion to Java  
strings that streaming does. Therefore, pipes can support arbitrary  
Writable objects. Hopefully in the future, we can change the map/ 
reduce api to provide access to the raw bytes in the mapper and  
reducer as an option. In that case, pipes would not need to serialize  
at all.

-- Owen
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message