hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Tromans <>
Subject Re: nested UDFs on Partition column
Date Thu, 19 Apr 2012 15:35:34 GMT
I don't know what the state of Hive's partition pruning is, but I
would imagine that the problem is that the two example you're giving
are fundamentally different.

1) WHERE local_date = =date_add('2011-12-07',3) ,

the udf is a function of some constants, so the constant gets
evaluated at compile time.

2) WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),

unix_timestamp() is not explicitly a constant, and so the expression
won't be simplified.

I would imagine that the constant simplification code probably doesn't
know the difference between a partition column and a real column, and
so treats everything as a real column. If local_date wasn't a
partition column, then there's no reasonable way of simplifying that
predicate at compile time.



On 19 April 2012 11:50, Nitin Pawar <> wrote:
> as per my understanding,
> In this case hive needs to look for all the partitions because it does not
> have the value before hand on the partition check and note the udfs are
> executed on the mapred and not on hive client side.
> I would suggest you write a hive query in a file and replace the partition
> value with a variable
> something like
> for partitionValue in values
>           hive $HIVEPARAMS -hiveconf  partition=$partition -e hivequery.hql
> and then in hivequery.sql you can refer the variable with
> where column_name = '${hiveconf:partition}'
> I may be wrong in interpreting the execution pattern of hivequery but this
> approach solved my problem
> Thanks,
> nitin
> On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <>
> wrote:
>> Hi,
>> I have a table partitioned by local_date.  When I write a query with
>> WHERE local_date = =date_add('2011-12-07',3) ,
>> hive executes the UDF ahead and looks only into the specific partitions.
>> But when the udf becomes more complex like
>> WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),
>> hive looks through all the partitions even though the above function  can
>> very well be computed ahead of time and optimize the query.  Is this
>> behaviour intentional ? And is there a workaround other than hardcoding the
>> date or using a param?
>> Thanks,
>> Ramkumar
> --
> Nitin Pawar

View raw message