hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay <jayadeep.jayara...@gmail.com>
Subject Out of Memory while generating ORC Splits
Date Wed, 13 Sep 2017 09:24:57 GMT
Hi All,

I am running a simple select query as below

select distinct vehicle_no from
rmd.gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3 where incident_dt =
'2999-01-01';

The table is a 2 level partitioned table as shown below

drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2010-01-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2011-01-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:35
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2012-01-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2013-01-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-01-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-02-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-03-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:36
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-04-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:34
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-05-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:33
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-06-01
drwx------   - gpadmin hdfs          0 2017-09-12 14:33
/apps/hive/warehouse/rmd.db/gets_dw_eoa_eng_rec_dtl_orc_ext_concat_final_eng3/source_type_cd=ENG3/incident_dt=2014-07-01


The ORC files have been created with a rough size of 2 GB and have ZLIB
compression.

When the hive.exec.orc.split.strategy is set to HYBRID in our HDP 2.6.1
cluster the MAP phase is stuck in the INITIALIZATION phases and after about
20 minutes it fails with OOM.

When I change hive.exec.orc.split.strategy to BI the SQL runs fine without
any issues.

My question is what parameter controls the memory assigned while Hive/Tez
generates the splits?

the hive container size is set to 8GB

Thanks,
Jayadeep

Mime
View raw message