hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-18822) INSERT VALUES - HoS + Steaming File Format
Date Thu, 01 Mar 2018 01:05:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381342#comment-16381342
] 

Thejas M Nair edited comment on HIVE-18822 at 3/1/18 1:04 AM:
--------------------------------------------------------------

This is not exactly what you are asking for, but FYI - [Streaming ingest feature (ACID)|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest],
you can get more efficient "insert values" equivalent functionality without running into small
files issue. But it needs ORC file format, and its not SQL api.


was (Author: thejas):
This is not exactly what you are asking for, but FYI - [Streaming ingest feature (ACID)|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest],
you can get more efficient insert "values" equivalent functionality without running into small
files issue. But it needs ORC file format, and its not SQL api.

> INSERT VALUES - HoS + Steaming File Format
> ------------------------------------------
>
>                 Key: HIVE-18822
>                 URL: https://issues.apache.org/jira/browse/HIVE-18822
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: BELUGA BEHR
>            Priority: Minor
>
> Please optimize the INSERT VALUES function.  When HoS is being used, and a streaming
format such as TEXT or AVRO are being used, INSERT VALUES statements should be quick.  The
HiveServer2 should pass the vales to the Executor and the Executor should simply append the
data to an existing HDFS file instead of creating a new one.  This will reduce the number
of small files that exist in the file system... or perhaps the HiveServer2 performs the append
without having to first sent the data to the processing engine at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message