hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-51) Generate and accept JSON as the input-output format from mappers and reducers
Date Fri, 14 Nov 2008 01:58:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647496#action_12647496

Zheng Shao commented on HIVE-51:

An alternative approach is to specify that right in the query:

MAP table.col1, table.col2
USING 'python filter.py'
AS x1, x2

This makes the syntax for specifying row format the same in map/reduce scripts and in create
table statement.
At the same time we will be able to support ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

> Generate and accept JSON as the input-output format from mappers and reducers
> -----------------------------------------------------------------------------
>                 Key: HIVE-51
>                 URL: https://issues.apache.org/jira/browse/HIVE-51
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Venky Iyer
> set mapred.data.format=JSON;
> ....
> MAP USING 'python filter.py'
> ....;
> would mean that filter.py would receive a JSON formatted dictionary of the columns instead
of a tab-delimited string.
> { column1: value1, column2: [1,2,3] } etc
> It would in turn produce JSON.
> This should be done so that the JSON is not transmitted back and forth over the network;
it would be generated on the fly on the mapper node, and converted back to the standard format
used (tab-delimited, I assume).
> This seems like the simplest way for encoding type information in the input to mappers;
it would also remove the need for silly boilerplate code that took a list of expected input
column names, took the input stream, split it up, and made a dictionary of {column name: value}
on every record.
> Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm
not sure if that is doable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message