orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: Questions regarding hive --orcfiledump or exporting orcfiles
Date Mon, 29 Jan 2018 21:55:54 GMT
My guess is that you should be able to save a fair amount of time by doing
a byte copy rather than rewriting the ORC file.

To get a distributed copy, you'd probably want to use distcp and then
create the necessary tables and partitions for your Hive metastore.

.. Owen


On Mon, Jan 29, 2018 at 1:16 PM, Colin Williams <
colin.williams.seattle@gmail.com> wrote:

> Hello,
>
> Wasn't sure if I should ask here or on the Hive mailing list. We're
> creating External tables from an S3 bucket that contains some textfile
> records. Then we import these tables with STORED AS ORC.
>
> We have about 20 tables, and it takes a couple hours to create the tables.
> However currently we are just using a static data set.
>
> Then I'm wondering can I reduce the load time by exporting the tables
> using hive --orcfiledump or just copying the files from HDFS into an S3
> bucket. And then load into HDFS again? Will this likely save me a bit of
> load time?
>
>
> Best,
>
> Colin Williams
>

Mime
View raw message