hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogala, Chandra Reddy" <Chandra.Bog...@gs.com>
Subject RE: Predicate pushdown optimisation not working for ORC
Date Thu, 03 Apr 2014 14:10:29 GMT
I thought ORC file can be generated only by running hive query on staging table and inserting
into ORC table. If there is option to generate ORC file at client side by using java code
then can you share that code or links related to that?
Thanks,
Chandra

From: Abhay Bansal [mailto:abhaybansal.1988@gmail.com]
Sent: Thursday, April 03, 2014 11:06 AM
To: user@hive.apache.org
Subject: Predicate pushdown optimisation not working for ORC

I am new to Hive, apologise for asking such a basic question.

Following exercise was done with hive .12 and hadoop 0.20.203

I created a ORC file form java, and pushed it into a table with the same schema. I checked
the conf property <property><name>hive.optimize.ppd</name><value>true</value></property>
which should ideally use the ppd optimisation.

I ran a query "select sourceipv4address,sessionid,url from test where sourceipv4address="dummy";"

Just to see if the ppd optimization is working I checked the hadoop logs where I found

./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03 05:01:39,913
INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids = 3,8,13
./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03 05:01:39,914
INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names = sourceipv4address,sessionid,url
./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_000000_0/syslog:2014-04-03 05:01:39,914
INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: No ORC pushdown predicate

I am not sure which part of it I missed. Any help would be appreciated.

Thanks,
-Abhay

Mime
View raw message