hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Commented: (HIVE-51) Generate and accept JSON as the input-output format from mappers and reducers
Date Fri, 14 Nov 2008 01:58:44 GMT


Zheng Shao commented on HIVE-51:

An alternative approach is to specify that right in the query:

MAP table.col1, table.col2
USING 'python'
AS x1, x2

This makes the syntax for specifying row format the same in map/reduce scripts and in create
table statement.
At the same time we will be able to support ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

> Generate and accept JSON as the input-output format from mappers and reducers
> -----------------------------------------------------------------------------
>                 Key: HIVE-51
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Venky Iyer
> set;
> ....
> MAP USING 'python'
> ....;
> would mean that would receive a JSON formatted dictionary of the columns instead
of a tab-delimited string.
> { column1: value1, column2: [1,2,3] } etc
> It would in turn produce JSON.
> This should be done so that the JSON is not transmitted back and forth over the network;
it would be generated on the fly on the mapper node, and converted back to the standard format
used (tab-delimited, I assume).
> This seems like the simplest way for encoding type information in the input to mappers;
it would also remove the need for silly boilerplate code that took a list of expected input
column names, took the input stream, split it up, and made a dictionary of {column name: value}
on every record.
> Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm
not sure if that is doable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message