hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1096) Hive Variables
Date Thu, 18 Feb 2010 23:50:28 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835504#action_12835504

Carl Steinbach commented on HIVE-1096:

bq. Philosophically I agree. In actuality have Hive/Hadoop conf is easily manipulated by changing
your hadoop-site.xml or hive-site.xml. Users do have unprotected access to the namespace that
is the nature of hadoop. Users of hive are setting variables all the time.

True, but I think we should try to improve the situation. As a start we can add code to throw
an error if hive-default.xml or hive-site.xml sets a hive.* configuration property that is
not defined in HiveConf. This would protect the hive.* namespace and at the same time make
it easy to track down cases where folks misspell a hive.* property name.

bq. The only true difference in implementation is that your doing it with properties and I
am doing it with HiveConf Vars. If we support both I think we are both happy. Any ideas?

I agree that we should support access to both system properties and hiveconf properties, but
if we do how will we resolve cases where the user references {{${foo.bar}}} and both the system
and hiveconf define properties named foo.bar? Also, another problem I see with using the hiveconf
namespace for user variable definitions is that user variables cease to have any meaning past
the client-side query preprocessing step, yet since they're part of the hiveconf they will
get included in the jobconf and sent to datanodes. 

Here's a proposal:

* Allow users to reference variables in QL statements using the syntax {{${namespace:variable_name}}}.
* Users can define variables on the command line using a new "{{-hivevar x=y}}" switch. Values
defined in this manner become part of the user namespace, which is the default namespace.
They can be referenced as either {{${default:variablename}}} or {{${variablename}}}.
* Hive configuration properties are part of the "hiveconf" namespace, and can be referenced
as {{${hiveconf:propertyname}}}.
* System properties are part of the "system" namespace, and can be referenced as {{${system:property_name}}}.

What do you think?

> Hive Variables
> --------------
>                 Key: HIVE-1096
>                 URL: https://issues.apache.org/jira/browse/HIVE-1096
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>         Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff,
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature called "Variables."
Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09
and then refer to the variable via ${DT} within the hive queries. This could be extremely
useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere
in the roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, and further
downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. but based
on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss this more.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message