hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dudu Markovitz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause
Date Sat, 22 Apr 2017 09:07:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dudu Markovitz updated HIVE-16496:
----------------------------------
    Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

Here are some useful use cases:

h3. use-case 1: join

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous column reference"
and without the need to specify all the columns of at least one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be come very ugly
and error prone.
Often this require some scripting work.

h3. use-case 2: view
Creating views with all the tables columns except for some technical columns
 
{code}
create myview as select * exclude (cre_ts,upd_ts) from t;
{code}

h3. use-case 3: row_number
Remove computational columns that are not needed in the final row-set, e.g. -
retrieve the last record for each customer

{code}
select  * exclude (rn)

from   (select  t.*
               ,row_number() over (partition by customer_id order by ts desc) as rn

        from    t
        ) t
        
where   rn = 1
{code}


  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous column reference"
and without the need to specify all the columns of at least one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be come very ugly
and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support
The positional notation is similar to the one used but the *cut* unix command 

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3))     from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same as previous
example)
select * exclude ($(3,5-))    from t   -- exclude the 3rd column and all columns from the
5th column to the end
select * exclude (-$(1,2))    from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th column from the
end
select * exclude (-$(4-))     from t   -- exclude all columns except for the last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the Hive users community
as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have an opportunity
here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> ------------------------------------------------------------------
>
>                 Key: HIVE-16496
>                 URL: https://issues.apache.org/jira/browse/HIVE-16496
>             Project: Hive
>          Issue Type: Wish
>          Components: Parser
>            Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding asterisk.
> Here are some useful use cases:
> h3. use-case 1: join
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting "Ambiguous column
reference" and without the need to specify all the columns of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can be come very
ugly and error prone.
> Often this require some scripting work.
> h3. use-case 2: view
> Creating views with all the tables columns except for some technical columns
>  
> {code}
> create myview as select * exclude (cre_ts,upd_ts) from t;
> {code}
> h3. use-case 3: row_number
> Remove computational columns that are not needed in the final row-set, e.g. -
> retrieve the last record for each customer
> {code}
> select  * exclude (rn)
> from   (select  t.*
>                ,row_number() over (partition by customer_id order by ts desc) as rn
>         from    t
>         ) t
>         
> where   rn = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message