hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1087) Let user script write out binary data into a table
Date Tue, 26 Jan 2010 07:44:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-1087:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.6.0
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Zheng!

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1087.1.patch, HIVE-1087.2.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as
it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can
split the binary stream into records, and a BinaryOutputFormat which writes out the data as
it is without any separators.
> Example:
> {code}
> DROP TABLE dest1;
> -- Create a table with binary output format
> CREATE TABLE dest1(mydata STRING)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'serialization.last.column.takes.rest'='true'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';
> -- Insert into that table using transform
> EXPLAIN EXTENDED
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> -- Test the result
> SELECT * FROM dest1;
> DROP TABLE dest1;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message