drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Méthot <fmetho...@gmail.com>
Subject Limit the number of output parquet files in CTAS
Date Mon, 31 Oct 2016 19:57:56 GMT

Is there a way to limit the number of files produced by a CTAS query ?
I would like the speed benefits of having hundreds of scanner fragment but
don't want to deal with hundreds of output files.

Our usecase right now is using 880 thread to scan and produce a report
output spread over... 880 parquets files.
Each resulting file is ~7M.

Only way I found to reduce those files to smaller set is  to a perform
second CTAS query on the aggregated files with planner.width.max_per_query
set to smaller number.

Any possible way to do this in one query?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message