drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Padma Penumarthy <ppenumar...@mapr.com>
Subject Re: Single Hdfs block per parquet file
Date Wed, 22 Mar 2017 20:29:00 GMT
I think we create one file for each parquet block.
If underlying HDFS block size is 128 MB and parquet block size  is  > 128MB, 
it will create more blocks on HDFS. 
Can you let me know what is the HDFS API that would allow you to
do otherwise ?

Thanks,
Padma


> On Mar 22, 2017, at 11:54 AM, François Méthot <fmethot78@gmail.com> wrote:
> 
> Hi,
> 
> Is there a way to force Drill to store CTAS generated parquet file as a
> single block when using HDFS? Java HDFS API allows to do that, files could
> be created with the Parquet block-size.
> 
> We are using Drill on hdfs configured with block size of 128MB. Changing
> this size is not an option at this point.
> 
> It would be ideal for us to have single parquet file per hdfs block, setting
> store.parquet.block-size to 128MB would fix our issue but we end up with a
> lot more files to deal with.
> 
> Thanks
> Francois

Mime
View raw message