hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Hive Stored Textfile to Stored ORC taking long time
Date Fri, 09 Dec 2016 09:47:56 GMT
How large is the file? Might IO be an issue? How many disks have you on the only node?

Do you compress the ORC (snappy?). 

What is the Hadoop distribution? Configuration baseline? Hive version?

Not sure if i understood your setup, but might network be an issue?

> On 9 Dec 2016, at 02:08, Joaquin Alzola <Joaquin.Alzola@lebara.com> wrote:
> 
> HI List
>  
> The transformation from textfile table to stored ORC table takes quiet a long time.
>  
> Steps follow>
>  
> 1.Create one normal table using textFile format
> 
> 2.Load the data normally into this table
> 
> 3.Create one table with the schema of the expected results of your normal hive table
using stored as orcfile
> 
> 4.Insert overwrite query to copy the data from textFile table to orcfile table
> 
>  
> I have about 1,5 million records with about 550 fields in each row.
>  
> Doing step 4 takes about 30 minutes (moving from one format to the other).
>  
> I have spark with only one worker (same for HDFS) so running now a standalone server
but with 25G and 14 cores on that worker.
>  
> BR
>  
> Joaquin
> This email is confidential and may be subject to privilege. If you are not the intended
recipient, please do not copy or disclose its content but contact the sender immediately upon
receipt.

Mime
View raw message