hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sammy Yu <>
Subject Merging small files with dynamic partitions
Date Fri, 15 Oct 2010 20:43:08 GMT
  I have a dynamic partition query which generates quite a few small
files which I would like to merge:

SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET hive.merge.size.per.task=256000000;
SET hive.merge.smallfiles.avgsize=16000000000;
SET hive.merge.mapfiles=true;
SET hive.merge.mapredfiles=true;
SET hive.mergejob.maponly=true;
INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table
PARTITION(org_id, day)
SELECT session_id, permanent_id, first_date, last_date, week, month, quarter,
referral_type, search_engine, us_search_engine,
keyword, unnormalized_keyword, branded, conversion_meet, goals_meet,
entry_page, page_types,
org_id, day
FROM daily_conversions_without_rank_table;

I am running the latest version from trunk with HIVE-1622, but it
seems like I just can't get the post merge process to happen. I have
raised hive.merge.smallfiles.avgsize.  I'm wondering if the filtering
at runtime is causing the merge process to be skipped.  Attached are
the hive output and log files.


View raw message