spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel D <games2013....@gmail.com>
Subject Schema Evolution Parquet vs Avro
Date Tue, 30 May 2017 02:04:13 GMT
Hi,

We are trying to come up with the best storage format for handling schema
changes in ingested data.

We noticed that both avro and parquet allows one to select based on column
name instead of the data index/position of data. However, we are inclined
towards parquet for better read performance since it's columnar and we will
be selecting few columns instead of all. Data will be processed and saved
to partitions on which we will have hive external tables.

Will parquet be able to handle the following:
- Column renaming from between data
- Column removal from between
- DataType change of existing column (int to bigint should be allowed,
right?)

Please advise.

Thanks,
Sam

Mime
View raw message