hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (HIVE-51) Generate and accept JSON as the input-output format from mappers and reducers
Date Mon, 01 Dec 2008 22:17:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Zheng Shao reassigned HIVE-51:

    Assignee: Zheng Shao

> Generate and accept JSON as the input-output format from mappers and reducers
> -----------------------------------------------------------------------------
>                 Key: HIVE-51
>                 URL: https://issues.apache.org/jira/browse/HIVE-51
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Venky Iyer
>            Assignee: Zheng Shao
> set mapred.data.format=JSON;
> ....
> MAP USING 'python filter.py'
> ....;
> would mean that filter.py would receive a JSON formatted dictionary of the columns instead
of a tab-delimited string.
> { column1: value1, column2: [1,2,3] } etc
> It would in turn produce JSON.
> This should be done so that the JSON is not transmitted back and forth over the network;
it would be generated on the fly on the mapper node, and converted back to the standard format
used (tab-delimited, I assume).
> This seems like the simplest way for encoding type information in the input to mappers;
it would also remove the need for silly boilerplate code that took a list of expected input
column names, took the input stream, split it up, and made a dictionary of {column name: value}
on every record.
> Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm
not sure if that is doable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message