hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@cloudtechnologypartners.co.uk>
Subject RE: Difference between RC file format & Parquet file format
Date Thu, 18 Feb 2016 16:39:01 GMT
 

ORC has what is called storage index built in that provide data +
statistics. 

It provide stats at file, stripe and rowgroup (batches of rows) levels.
In terms of efficiency, for Data warehouse applications it is best
format I believe 

On 18/02/2016 07:38, Abhishek Dubey wrote: 

> I think it's fair to say that one of the main differences is the representation of nesting
structure. 
> 
> PARQUET uses Dremel's repetition and definition levels, which is an extremely efficient
representation of nested structure that has the 
> 
> added benefit of being easy to embed into the column data itself; 
> 
> Julien wrote an excellent blog post that explains the details: https://blog.twitter.com/2013/dremel-made-simple-with-parquet

> 
> ORCFILE on the other hand uses separate "counter" columns, which means that for nested
structures you need to read those counter columns in 
> 
> addition to the data columns you care about in order to recreate the nesting structure;
this increases the required amount of random I/O. 
> 
> Also, Parquet is natively supported in a number of popular Hadoop frameworks: Pig, Impala,
Hive, MR, Cascading. 
> 
> Source : https://groups.google.com/forum/#!topic/parquet-dev/0IdtSLdIINQ [1] 
> 
> THANKS & REGARDS,
> ABHISHEK DUBEY 
> 
> FROM: Ravi Prasad [mailto:raviprasad29@gmail.com] 
> SENT: Thursday, February 18, 2016 9:06 AM
> TO: user@hive.apache.org
> SUBJECT: Difference between RC file format & Parquet file format 
> 
> Hi all, 
> 
> Can you please let me know, 
> 
> How the RC file format is different from the Parquet file format. 
> 
> Both are column oriented file format, then what are the difference.
> 
> -- 
> 
> ----------------------------------------------
> Regards,
> RAVI PRASAD. T

-- 

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential.
This message is for the designated recipient only, if you are not the
intended recipient, you should destroy it immediately. Any information
in this message shall not be understood as given or endorsed by Cloud
Technology Partners Ltd, its subsidiaries or their employees, unless
expressly so stated. It is the responsibility of the recipient to ensure
that this email is virus free, therefore neither Cloud Technology
partners Ltd, its subsidiaries nor their employees accept any
responsibility.

 

Links:
------
[1] https://groups.google.com/forum/#!topic/parquet-dev/0IdtSLdIINQ

Mime
View raw message