hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Dubey <Abhishek.Du...@Xoriant.Com>
Subject RE: Difference between RC file format & Parquet file format
Date Thu, 18 Feb 2016 07:38:45 GMT
I think it's fair to say that one of the main differences is the representation of nesting

Parquet uses Dremel's repetition and definition levels, which is an extremely efficient representation
of nested structure that has the
added benefit of being easy to embed into the column data itself;

Julien wrote an excellent blog post that explains the details:

Orcfile on the other hand uses separate "counter" columns, which means that for nested structures
you need to read those counter columns in
addition to the data columns you care about in order to recreate the nesting structure; this
increases the required amount of random I/O.

Also, Parquet is natively supported in a number of popular Hadoop frameworks: Pig, Impala,
Hive, MR, Cascading.

Source :!topic/parquet-dev/0IdtSLdIINQ

Thanks & Regards,
Abhishek Dubey

From: Ravi Prasad []
Sent: Thursday, February 18, 2016 9:06 AM
Subject: Difference between RC file format & Parquet file format

Hi all,
  Can you please let me know,
How the RC file format is different from the Parquet file format.
Both are column oriented file format, then what are the difference.

View raw message