hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <daniel.ha...@veracity-group.com>
Subject Re: Understanding ORC file format compression
Date Sun, 21 Jun 2015 08:03:25 GMT
Hi Sreejesh,
The data in an ORC file is divided into stripes and in these stripes columns are divided into
column groups.
The compression is at the column group level, so to answer your question ORC files are splittable
no matter the codec used.

Daniel

> On 21 ביוני 2015, at 10:56, sreejesh s <sreejesh356@yahoo.com> wrote:
> 
> Hi,
> 
> As per my understanding, the available codecs for ORC file format Hive table compression
are either Zlib or Snappy.
> Both the compression techniques are non splittable.. Does it mean that any queries on
Hive table stored as ORC and compressed will not run multiple maps in parallel ???
> 
> I know that is not correct, please help me understand what i am missing here...
> 
> Thanks

Mime
View raw message