hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Goldenberg <dgoldenb...@hexastax.com>
Subject Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?
Date Tue, 04 Apr 2017 13:52:16 GMT
We have a table such as the following defined:

CREATE TABLE IF NOT EXISTS db.mytable (
  `item_id` string,
  `timestamp` string,
  `item_comments` string)
PARTITIONED BY (`date`, `content_type`)
STORED AS PARQUET;

Currently we insert data into this PARQUET, PARTITIONED table as follows,
using an intermediary table:

INSERT INTO TABLE db.mytable PARTITION(date, content_type)
SELECT itemid as item_id, itemts as timestamp, date, content_type
FROM db.origtable
WHERE date = “${SELECTED_DATE}”
GROUP BY item_id, date, content_type;
Our question is, would it be possible to use the LOAD DATA INPATH.. INTO
TABLE syntax to load the data from delimited data files into 'mytable'
rather than populating mytable from the intermediary table?

I see in the Hive documentation that:
* Load operations are currently pure copy/move operations that move
datafiles into locations corresponding to Hive tables.
* If the table is partitioned, then one must specify a specific partition
of the table by specifying values for all of the partitioning columns.

This seems to indicate that using LOAD is possible; however looking at this
discussion:
http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables,
perhaps not?

We'd like to understand if using LOAD in the case of PARQUET, PARTITIONED
tables is possible and if so, then how does one go about using LOAD in that
case?

Thanks,
- Dmitry

Mime
View raw message