impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Mokhtar <mmokh...@cloudera.com>
Subject Re: Computing stats on big partitioned parquet tables
Date Fri, 19 Jan 2018 02:13:13 GMT
Hi,

Do you mind sharing the query profile for the query that failed with OOM?
there should be some clues on to why the OOM is happening.

Thanks
Mostafa


On Thu, Jan 18, 2018 at 5:54 PM, Thoralf Gutierrez <
thoralfgutierrez@gmail.com> wrote:

> Hello everybody!
>
> (I am using Impala 2.8.0, out of Cloudera Express 5.11.1)
>
> I now understand that we are _highly_ recommended to compute stats for our
> tables so I have decided to make sure we do.
>
> On my quest to do so, I started with a first `COMPUTE INCREMENTAL STATS
> my_big_partitioned_parquet_table` and ran into :
>
> > HiveServer2Error: AnalysisException: Incremental stats size estimate
> exceeds 200.00MB. Please try COMPUTE STATS instead.
>
> I found out that we could increase this limit, so I set
> inc_stats_size_limit_bytes to 1073741824 (1GB)
>
> > HiveServer2Error: AnalysisException: Incremental stats size estimate
> exceeds 1.00GB. Please try COMPUTE STATS instead.
>
> So I ended up trying to COMPUTE STATS for the whole table instead of
> incrementally, but I still hit memory limits when computing counts with my
> mem_limit at 34359738368 (32GB)
>
> > Process: memory limit exceeded. Limit=32.00 GB Total=48.87 GB Peak=51.97
> GB
>
> 1. Am I correct to assume that even if I did not have enough memory, the
> query should spill to disk and just be slower instead of OOMing?
> 2. Any other recommendation on how else I could go about computing some
> stats on my big partitioned parquet table?
>
> Thanks a lot!
> Thoralf
>
>

Mime
View raw message