spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Singh, Abhijeet" <>
Subject RE: parquet file doubts
Date Mon, 07 Dec 2015 13:21:24 GMT
Yes, Parquet has min/max.

From: Cheng Lian []
Sent: Monday, December 07, 2015 11:21 AM
To: Ted Yu
Cc: Shushant Arora;
Subject: Re: parquet file doubts

Oh sorry... At first I meant to cc spark-user list since Shushant and I had been discussed
some Spark related issues before. Then I realized that this is a pure Parquet issue, but forgot
to change the cc list. Thanks for pointing this out! Please ignore this thread.

On 12/7/15 12:43 PM, Ted Yu wrote:
I only see user@spark in the CC.


On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian <<>>
cc parquet-dev list (it would be nice to always do so for these general questions.)


On 12/6/15 3:10 PM, Shushant Arora wrote:

I have few doubts on parquet file format.

1.Does parquet keeps min max statistics like in ORC. how can I see parquet version(whether
its1.1,1.2or1.3) for parquet file generated using hive or custom MR or AvroParquetoutputFormat.
Yes, Parquet also keeps row group statistics. You may check the Parquet file using the parquet-meta
CLI tool in parquet-tools (see for details),
then look for the "creator" field of the file. For programmatic access, check for o.a.p.hadoop.metadata.FileMetaData.createdBy. to sort parquet records while generating parquet file using avroparquetoutput format?
AvroParquetOutputFormat is not a format. It's just responsible for converting Avro records
to Parquet records. How are you using AvroParquetOutputFormat? Any example snippets?


To unsubscribe, e-mail:<>
For additional commands, e-mail:<>

View raw message