hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogesh Keshetty <>
Subject RE: Dynamic partitioned parquet tables
Date Fri, 09 Oct 2015 23:12:27 GMT

 Any one tried this? Please help me if you have any knowledge on this kind of use case.
Subject: Dynamic partitioned parquet tables
Date: Fri, 9 Oct 2015 11:20:57 +0530

I have a question regarding parquet tables. We have POS data, we want to store the data on
per day partition basis.  We sqoop the data into an external table which is in text file format
and then try to insert into an external table which is partitioned by date and, due to some
requirements, we wanted to keep these files as parquet files. The average file size per day
is around 2 MB. I know that parquet is not meant to be for lot of small files. But, we wanted
to keep it that way. The problem is during the initial historical data load we are trying
to create dynamic partitions, however no matter how much memory I set the jobs keeps failing
because of memory issues. But after some research I found out that turning ,"set hive.optimize.sort.dynamic.partition
= true", this property on we could create dynamic partitioned tables. But this is taking longer
time than what we expected, is there anyway that we can boost the performance? Also, in spite
of turning the property on when we try to create dynamic partitions for multiple years data
at a time we are again running into heap error. How can we handle this problem? Please help
Thanks in advance!
Thank you,
View raw message