spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Parquet file size
Date Wed, 07 Oct 2015 19:18:29 GMT
Why do you want larger files? Doesn't the result Parquet file contain 
all the data in the original TSV file?

Cheng

On 10/7/15 11:07 AM, Younes Naguib wrote:
>
> Hi,
>
> I’m reading a large tsv file, and creating parquet files using sparksql:
>
> insert overwrite
>
> table tbl partition(year, month, day)....
>
> Select .... from tbl_tsv;
>
> This works nicely, but generates small parquet files (15MB).
>
> I wanted to generate larger files, any idea how to address this?
>
> *Thanks,*
>
> *Younes Naguib***
>
> Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G 1R8
>
> Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | 
> younes.naguib@tritondigital.com<mailto:younes.naguib@streamtheworld.com>
>


Mime
View raw message