hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: ORC slipt
Date Fri, 31 Mar 2017 17:00:32 GMT
Please find answers inline.

From: Alberto Ramón <<>>
Sent: Friday, March 31, 2017 9:32 AM
Subject: ORC slipt
To: <<>>

Some doubts about ORC:

1- hive.exec.orc.default.buffer.size is used for read or write?
Configurable only during write. Writer writes this buffer size into the footer which readers
use during decompression.

2- orc.stripe.size is compressed or uncompresed?

Both. Stripe size is essentially sum of all buffers of all columns (also dictionary size)
held in memory.

3- orc.stripe.size must be multiple of HDFS block size?

It is optimal to have it as multiple of hdfs block size else writer will adjust the last stripe
size within a block so as to not straddle hdfs block boundary or pad the remaining space if
it is less than 5% of block size. Note that hdfs block size can be configurable via orc.block.size
and is independent of cluster wide block size. Default stripe size is 64 mb and block size
is 256mb.

4- For read ORC file , the numbers of mappers depends onr HDFS blocks or Stripe number?

Depends. If predicate pushdown is enabled each split could have one or more stripes. If predicate
pushdown is disabled adjacent stripes are grouped together until hdfs block size to form a
single split.

Let's say, we have 3 stripes and if 2nd stripe does not satisfy the predicate then 1st and
3rd stripe will become 2 separate splits and 2nd stripe will be ignored. If predicate pushdown
is disabled, all 3 stripes will together form a single split as it is less than block boundary.

Number of splits will vary based on input format and execution engine.

5- hive.exec.orc.split.strategy is used for read?


View raw message