hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
Subject Re: Hive Stored Textfile to Stored ORC taking long time
Date Fri, 09 Dec 2016 04:01:36 GMT
Yes, we did run into this issue too. Typically if the text hive table
exceeds 100 million when converting txt table into ORC table.

On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola <Joaquin.Alzola@lebara.com>
wrote:

> HI List
>
>
>
> The transformation from textfile table to stored ORC table takes quiet a
> long time.
>
>
>
> Steps follow>
>
>
>
> 1.Create one normal table using textFile format
>
> 2.Load the data normally into this table
>
> 3.Create one table with the schema of the expected results of your normal
> hive table using stored as orcfile
>
> 4.Insert overwrite query to copy the data from textFile table to orcfile
> table
>
>
>
> I have about 1,5 million records with about 550 fields in each row.
>
>
>
> Doing step 4 takes about 30 minutes (moving from one format to the other).
>
>
>
> I have spark with only one worker (same for HDFS) so running now a
> standalone server but with 25G and 14 cores on that worker.
>
>
>
> BR
>
>
>
> Joaquin
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>

Mime
View raw message