hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Tucker <matthewt...@gmail.com>
Subject Re: Variable Substitution
Date Wed, 06 Mar 2013 21:50:38 GMT
I'm fine with the variable placeholder not being removed in cases where the variable is not
defined (until I change my mind). When I define var2 and var3, though, their placeholders
aren't swapped for their values.

My reasoning for this was that I'm moving from one execution script that defines var1, to
a newer script that defines var2 and var3.  The goal for this was that I could use the same
hive script with either execution script until we've fully migrated.

I don't disagree with the placeholders remaining if they're not defined, I'm just unsure that
having 1 variable reference that's not defined should prevent the others from working.


On Mar 6, 2013, at 4:41 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

> It was done like this in hive because that is what hadoops variable substitution does,
namely if it does not understand the variable it does not replace it.
> 
> On Wed, Mar 6, 2013 at 4:30 PM, Dean Wampler <dean.wampler@thinkbiganalytics.com>
wrote:
>> Even newer versions of Hive do this. Any reason you don't want to provide a definition
for all of them? You could argue that an undefined variable is a bug and leaving the literal
text in place makes it easier to notice. Although, Unix shells would insert an empty string,
so never mind ;)
>> 
>> On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker <matthewtckr@gmail.com> wrote:
>>> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes disabled
when I use a variable in a query that hasn't been defined.
>>> 
>>> For instance, using the following script:
>>> 
>>> set var2=2013-02-01;
>>> set var3=2013-02-10;
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt
>= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;
>>> 
>>> I would expect the results job configuration to list the hive.query.string as:
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt
>= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;
>>> 
>>> (or with ${hiveconf:var1} removed, leaving an empty string).
>>> 
>>> Instead, it prints the query as-is (with the variable placeholders).  If I set
var1=2012-01-01, it properly substitutes the variables, but it only returns the '2012-01-01'
record (not the 2013 records).
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR ( clndr_dt
>= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>>> 
>>> I was originally planning to use this for partition pruning, but it doesn't appear
to be the cause as the calendar table is not partitioned.
>>> 
>>> Is there something that I've overlooked?
>>> 
>>> Thanks!
>> 
>> 
>> 
>> -- 
>> Dean Wampler, Ph.D.
>> thinkbiganalytics.com
>> +1-312-339-1330
> 

Mime
View raw message