orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <colin.williams.seat...@gmail.com>
Subject Re: Questions regarding hive --orcfiledump or exporting orcfiles
Date Mon, 29 Jan 2018 22:50:39 GMT
Hi, I don't think we've reached the point where we are setting a partition
key. Haven't looked at ACID either. I'll give this a shot. Thanks for the
help!

On Mon, Jan 29, 2018 at 2:28 PM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> There are some details, but fundamentally yes.
>
> For non-partitioned tables, I'd probably distcp somewhere else and then
> use:
>
> hive> load data inpath 'hdfs:/staging/' overwrite into table MyTable;
>
> Assuming the table is backed by the same HDFS, it will do a hdfs mv to
> place the data in the right directory.
>
> Partitioned tables would need to be done with a load data command per a
> partition.
>
> See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#
> LanguageManualDML-Loadingfilesintotables
>
> If you are using ACID tables, it is more complicated.
>
> .. Owen
>
> On Mon, Jan 29, 2018 at 2:12 PM, Colin Williams <
> colin.williams.seattle@gmail.com> wrote:
>
>> Hi Owen,
>>
>> New to hive.
>>
>> Is the process as easy as
>>
>> # Export database
>> 1) distcp a database hdfs path to s3
>>
>> # Import database
>> 2) distcp the s3 database path back to an hdfs path
>> 3) use CREATE TABLE statment from hive and set LOCATOIN as hdfs path?
>>
>> On Mon, Jan 29, 2018 at 1:55 PM, Owen O'Malley <owen.omalley@gmail.com>
>> wrote:
>>
>>> My guess is that you should be able to save a fair amount of time by
>>> doing a byte copy rather than rewriting the ORC file.
>>>
>>> To get a distributed copy, you'd probably want to use distcp and then
>>> create the necessary tables and partitions for your Hive metastore.
>>>
>>> .. Owen
>>>
>>>
>>> On Mon, Jan 29, 2018 at 1:16 PM, Colin Williams <
>>> colin.williams.seattle@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Wasn't sure if I should ask here or on the Hive mailing list. We're
>>>> creating External tables from an S3 bucket that contains some textfile
>>>> records. Then we import these tables with STORED AS ORC.
>>>>
>>>> We have about 20 tables, and it takes a couple hours to create the
>>>> tables. However currently we are just using a static data set.
>>>>
>>>> Then I'm wondering can I reduce the load time by exporting the tables
>>>> using hive --orcfiledump or just copying the files from HDFS into an S3
>>>> bucket. And then load into HDFS again? Will this likely save me a bit of
>>>> load time?
>>>>
>>>>
>>>> Best,
>>>>
>>>> Colin Williams
>>>>
>>>
>>>
>>
>

Mime
View raw message