hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-499) Avoid the use of Strings to improve the performance of hadoop streaming
Date Fri, 01 Sep 2006 20:51:23 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-499?page=all ]

Hairong Kuang updated HADOOP-499:

    Attachment: text_streaming.patch

This patch includes the following fix:
1. replace the the use of UTF8 by Text in hadoop-streaming. Therefore, it fixesADOOP-413.
2. removes the use of stringsby adding simple manipulation of bytes arrays.
3. fix the stream close order when map/reduce finishes hence avoid truncated records.

> Avoid the use of Strings to improve the  performance of hadoop streaming
> ------------------------------------------------------------------------
>                 Key: HADOOP-499
>                 URL: http://issues.apache.org/jira/browse/HADOOP-499
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>    Affects Versions: 0.5.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.6.0
>         Attachments: text_streaming.patch
> In hadoop streaming, a record is represented as a String for  I/O and is encoded as UTF8
for map/reduce. A record has to be converted between String and UTF8 back and forth multiple
times and  this wastes CPU time. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message