hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhay Bansal <abhaybansal.1...@gmail.com>
Subject Re: Predicate pushdown optimisation not working for ORC
Date Fri, 04 Apr 2014 06:00:20 GMT
I was able to find the property with some digging around and
experimentation. Never knew that ppd had something to do with this
property.


On Thu, Apr 3, 2014 at 7:23 PM, Stephen Sprague <spragues@gmail.com> wrote:

> wow. good find. i hope these config settings are well documented and that
> you didn't have to spend alot time searching for that.  Interesting that
> the default isn't true for this one.
>
>
> On Wed, Apr 2, 2014 at 11:00 PM, Abhay Bansal <abhaybansal.1988@gmail.com>wrote:
>
>> I was able to resolve the issue by setting "hive.optimize.index.filter"
>> to true.
>>
>> In the hadoop logs
>> syslog:2014-04-03 05:44:51,204 INFO
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids =
>> 3,8,13
>> syslog:2014-04-03 05:44:51,204 INFO
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names =
>> sourceipv4address,sessionid,url
>> syslog:2014-04-03 05:44:51,216 INFO
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: ORC pushdown predicate:
>> leaf-0 = (EQUALS sourceipv4address 1809657989)
>>
>> I can now see the ORC pushdown predicate.
>>
>> Thanks,
>> -Abhay
>>
>>
>> On Thu, Apr 3, 2014 at 11:14 AM, Stephen Boesch <javadba@gmail.com>wrote:
>>
>>> HI Abhay,
>>>   What is the DDL for your "test" table?
>>>
>>>
>>> 2014-04-02 22:36 GMT-07:00 Abhay Bansal <abhaybansal.1988@gmail.com>:
>>>
>>> I am new to Hive, apologise for asking such a basic question.
>>>>
>>>> Following exercise was done with hive .12 and hadoop 0.20.203
>>>>
>>>> I created a ORC file form java, and pushed it into a table with the
>>>> same schema. I checked the conf
>>>> property <property><name>hive.optimize.ppd</name><value>true</value></property>
>>>> which should ideally use the ppd optimisation.
>>>>
>>>> I ran a query "select sourceipv4address,sessionid,url from test where
>>>> sourceipv4address="dummy";"
>>>>
>>>> Just to see if the ppd optimization is working I checked the hadoop
>>>> logs where I found
>>>>
>>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>>> 05:01:39,913 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
>>>> column ids = 3,8,13
>>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>>> 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
>>>> columns names = sourceipv4address,sessionid,url
>>>> ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03
>>>> 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: *No
>>>> ORC pushdown predicate*
>>>>
>>>>  I am not sure which part of it I missed. Any help would be
>>>> appreciated.
>>>>
>>>> Thanks,
>>>> -Abhay
>>>>
>>>
>>>
>>
>

Mime
View raw message