hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1087) Let user script write out binary data into a table
Date Tue, 26 Jan 2010 05:06:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-1087:
-----------------------------

    Description: 
We want to allow user script to write out binary stream data.
We don't need to understand the binary stream format, but we want to write the data as it
is to disk.

Since inside hive everything is a row object, we need to add a RecordReader which can split
the binary stream into records, and a BinaryOutputFormat which writes out the data as it is
without any separators.

Example:
{code}
DROP TABLE dest1;

-- Create a table with binary output format
CREATE TABLE dest1(mydata STRING)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.last.column.takes.rest'='true'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';

-- Insert into that table using transform
EXPLAIN EXTENDED
INSERT OVERWRITE TABLE dest1
SELECT TRANSFORM(*)
  USING 'cat'
  AS mydata STRING
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'serialization.last.column.takes.rest'='true'
    )
    RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
FROM src;

INSERT OVERWRITE TABLE dest1
SELECT TRANSFORM(*)
  USING 'cat'
  AS mydata STRING
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'serialization.last.column.takes.rest'='true'
    )
    RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
FROM src;

-- Test the result
SELECT * FROM dest1;

DROP TABLE dest1;
{code}


  was:
We want to allow user script to write out binary stream data.
We don't need to understand the binary stream format, but we want to write the data as it
is to disk.

Since inside hive everything is a row object, we need to add a RecordReader which can split
the binary stream into records, and a BinaryOutputFormat which writes out the data as it is
without any separators.



> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch, HIVE-1087.2.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as
it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can
split the binary stream into records, and a BinaryOutputFormat which writes out the data as
it is without any separators.
> Example:
> {code}
> DROP TABLE dest1;
> -- Create a table with binary output format
> CREATE TABLE dest1(mydata STRING)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'serialization.last.column.takes.rest'='true'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';
> -- Insert into that table using transform
> EXPLAIN EXTENDED
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> -- Test the result
> SELECT * FROM dest1;
> DROP TABLE dest1;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message