Why do you want larger files? Doesn't the result Parquet file contain all the data in the original TSV file?


On 10/7/15 11:07 AM, Younes Naguib wrote:


Im reading a large tsv file, and creating parquet files using sparksql:

insert overwrite

table tbl partition(year, month, day)....

Select .... from tbl_tsv;

This works nicely, but generates small parquet files (15MB).

I wanted to generate larger files, any idea how to address this?


Younes Naguib

Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8

Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib@tritondigital.com