hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joaquin Alzola <Joaquin.Alz...@lebara.com>
Subject RE: Hive Stored Textfile to Stored ORC taking long time
Date Fri, 09 Dec 2016 09:40:51 GMT
Did you do anything to mitigate this issue? Like putting it directly on the HDFS? Or thourg
spark instead of going through Hive?

From: Qiuzhuang Lian [mailto:qiuzhuang.lian@gmail.com]
Sent: 09 December 2016 04:02
To: user@hive.apache.org
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

Yes, we did run into this issue too. Typically if the text hive table exceeds 100 million
when converting txt table into ORC table.

On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola <Joaquin.Alzola@lebara.com<mailto:Joaquin.Alzola@lebara.com>>
wrote:
HI List

The transformation from textfile table to stored ORC table takes quiet a long time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive table using
stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone server but with
25G and 14 cores on that worker.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the intended recipient,
please do not copy or disclose its content but contact the sender immediately upon receipt.

This email is confidential and may be subject to privilege. If you are not the intended recipient,
please do not copy or disclose its content but contact the sender immediately upon receipt.
Mime
View raw message