hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: The build-in indexes in ORC file does not work.
Date Wed, 16 Mar 2016 14:18:34 GMT

> I have tried  bloom filter ,but it makes no improvement。I know about
> tez, but never use, I will try it later.
>    select count(*) from gprs where terminal_type=25080;
>   will not scan data
>      Time taken: 353.345 seconds

CombineInputFormat does not do any split-elimination, so MapReduce does
not get container speedups there.

Most of your ~300s looks to be the fixed overheads of setting up each task.

We could not fix this in MRv2 due to historical compatibility issues with
merge-joins & schema evolution (see

This is not recommended for regular use (other than in Tez), but you can
force split-elimination with

set hive.input.format=${hive.tez.input.format};

>>>> So,  has anyone used ORC's build-in indexes before (especially in
>>>>spark SQL)?  What's my issue?

We work on SparkSQL perf issues as well - this has to do with OrcRelation



View raw message