hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Error when running TPCDS query with Hive+LLAP
Date Mon, 25 Sep 2017 16:21:57 GMT

> Caused by: org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionError: VectorMapJoin
Hash table loading exceeded memory limits. estimatedMemoryUsage: 1644167752 noconditionalTaskSize:
463667612 inflationFactor: 2.0 threshold: 927335232 effectiveThreshold: 927335232

Most likely the table does not have column statistics to allow for it to estimate join sizes
correctly and run through hive's CBO.

Check if the explain plan says "Optimized by CBO".

Also, check if it says in_bloom_filter() on the store_sales scanner, because if the COLUMN
STATS: COMPLETE is missing the bloom filters get disabled because they can't be sized from
the row-counts.

> query25 against a 25GB dataset (my instance memory size is 64GB)

This is an artificial error, which is setup so that no single query can overload a daemon.

With a single node + single query setup, you probably can just disable the checking.

set hive.llap.mapjoin.memory.monitor.check.interval=0;

Cheers,
Gopal



Mime
View raw message