hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venky Iyer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4592) Generate and accept JSON as the input-output format from mappers and reducers
Date Wed, 05 Nov 2008 09:41:44 GMT
Generate and accept JSON as the input-output format from mappers and reducers
-----------------------------------------------------------------------------

                 Key: HADOOP-4592
                 URL: https://issues.apache.org/jira/browse/HADOOP-4592
             Project: Hadoop Core
          Issue Type: Wish
          Components: contrib/hive
            Reporter: Venky Iyer


set mapred.data.format=JSON;
....
MAP USING 'python filter.py'
....;

would mean that filter.py would receive a JSON formatted dictionary of the columns instead
of a tab-delimited string.

{ column1: value1, column2: [1,2,3] } etc

It would in turn produce JSON.

This should be done so that the JSON is not transmitted back and forth over the network; it
would be generated on the fly on the mapper node, and converted back to the standard format
used (tab-delimited, I assume).

This seems like the simplest way for encoding type information in the input to mappers; it
would also remove the need for silly boilerplate code that took a list of expected input column
names, took the input stream, split it up, and made a dictionary of {column name: value} on
every record.

Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure
if that is doable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message