hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-819) LineRecordWriter should not always insert tab char between key and value
Date Fri, 23 Mar 2007 00:08:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483362

Runping Qi commented on HADOOP-819:


> LineRecordWriter should not always insert tab char between key and value
> ------------------------------------------------------------------------
>                 Key: HADOOP-819
>                 URL: https://issues.apache.org/jira/browse/HADOOP-819
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>         Assigned To: Owen O'Malley
> With the current implementation of LineRecordWriter in TextOutputFormat, the client cannot
pass null key/or value to the write function, and a tab char is always inserted between  the
key and value. This works fine most time. However, in some 
> cases, one just does not want to have the extra tab char. A common example is that, if
I need to implement a utility similar 
> to the unix sort with some fields in the lines as the sort key, I can have my map to
extract the sort key from each line and pass the whole line as the value. The reducer just
outputs the values and ignore the keys. However, if I use TextOutputFormat, my output will
have an extra tab key in each of the lines, which is annoying. 
> A simple solution is that let the write function of LineRecordWriter accept null key
argument, and write out the value only if the key is null. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message