drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Méthot <fmetho...@gmail.com>
Subject Re: Parquet files size
Date Fri, 30 Jun 2017 11:16:14 GMT
Thanks for your opinion, I will look into
reducing planner.width.max_per_node for now, and try it back up when
smaller parquet files get rolled out.

On Thu, Jun 29, 2017 at 11:21 AM, Andries Engelbrecht <aengelbrecht@mapr.com
> wrote:

> With limited memory and what seems to be higher concurrency you may want
> to reduce the minor fragments (threads) per query per node.
> See if you can reduce planner.width.max_per_node on the cluster and not
> have too much impact on the response times.
>
> Slightly smaller (512MB) parquet files may potentially also help, but that
> is usually harder to restructure the data than system settings.
>
> --Andries
>
>
>
> On 6/29/17, 7:39 AM, "François Méthot" <fmethot78@gmail.com> wrote:
>
>     Hi,
>
>       I am investigating issue where we are started getting Out of Heap
> space
>     error when querying parquet files in Drill 1.10. It is currently set
> to 8GB
>     heap, and 20GB off -heap. We can't spare more.
>
>     We usually query 0.7 to 1.2 GB parquet files. recently we have been
> more on
>     the 1.2GB side. For same number of files.
>
>     It now fails on simple
>        select bunch of fields.... where ....needle in haystack type of
> params.
>
>
>     Drill is configured with the old reader:
>         store.parquet_use_reader=false
>         because of this bug DRILL-5435 (Limit cause Mem Leak)
>
>         I have set the max number of large query to 2 instead of 10
> temporarly,
>     It did help so far.
>
>     My question:
>     Could parquet file size be related to those new exceptions?
>     Would reducing max file size help to improve robustness of query in
> drill
>     (at the expense of having more files to scan)?
>
>     Thanks
>     Francois
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message