hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Predicate pushdown optimisation not working for ORC
Date Thu, 03 Apr 2014 13:53:03 GMT
wow. good find. i hope these config settings are well documented and that
you didn't have to spend alot time searching for that.  Interesting that
the default isn't true for this one.


On Wed, Apr 2, 2014 at 11:00 PM, Abhay Bansal <abhaybansal.1988@gmail.com>wrote:

> I was able to resolve the issue by setting "hive.optimize.index.filter" to
> true.
>
> In the hadoop logs
> syslog:2014-04-03 05:44:51,204 INFO
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids =
> 3,8,13
> syslog:2014-04-03 05:44:51,204 INFO
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names =
> sourceipv4address,sessionid,url
> syslog:2014-04-03 05:44:51,216 INFO
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: ORC pushdown predicate:
> leaf-0 = (EQUALS sourceipv4address 1809657989)
>
> I can now see the ORC pushdown predicate.
>
> Thanks,
> -Abhay
>
>
> On Thu, Apr 3, 2014 at 11:14 AM, Stephen Boesch <javadba@gmail.com> wrote:
>
>> HI Abhay,
>>   What is the DDL for your "test" table?
>>
>>
>> 2014-04-02 22:36 GMT-07:00 Abhay Bansal <abhaybansal.1988@gmail.com>:
>>
>> I am new to Hive, apologise for asking such a basic question.
>>>
>>> Following exercise was done with hive .12 and hadoop 0.20.203
>>>
>>> I created a ORC file form java, and pushed it into a table with the same
>>> schema. I checked the conf
>>> property <property><name>hive.optimize.ppd</name><value>true</value></property>
>>> which should ideally use the ppd optimisation.
>>>
>>> I ran a query "select sourceipv4address,sessionid,url from test where
>>> sourceipv4address="dummy";"
>>>
>>> Just to see if the ppd optimization is working I checked the hadoop logs
>>> where I found
>>>
>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>> 05:01:39,913 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
>>> column ids = 3,8,13
>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>> 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
>>> columns names = sourceipv4address,sessionid,url
>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>> 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: *No
>>> ORC pushdown predicate*
>>>
>>>  I am not sure which part of it I missed. Any help would be appreciated.
>>>
>>> Thanks,
>>> -Abhay
>>>
>>
>>
>

Mime
View raw message