hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothy Babu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3906) URI_Escape and URI_UnEscape UDF
Date Fri, 01 Feb 2013 05:05:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jothy Babu updated HIVE-3906:
-----------------------------

    Description: 
Current releases of Hive lacks a function which would encode URL or form parameters or it
escapes the URI.
The function URI_ESCAPE (uri) would return the encoded form  of the URI which would be useful
while using HiveQL.Its always advisable to encode URL or form parameters; plain form parameter
is vulnerable to cross site attack, SQL injection and may direct our web application into
some unpredicted output.

Functionality :-

Function Name: URI_ESCAPE (uri)

Returns the encoded form of the uri.
Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
-> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'

Usage :-

Case 1 : To get encoded uri corresponding to a particular uri

hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');

-> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'

Case 2 : To query a table to get encoded form of the urls corresponding to users
Table :- USER_URLS
userid |url

USR00001|http://www.example.com?a=l&t   
USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf                      
USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
USR01000|http://google.com/resource?key=value
USR10000|http://google.com/resource?key=value1 & value2
USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
USR10010|gopher://gopher.voa.gov
USR10100|http://www.apple.com/index.html
USR11000|file:/data/letters/to_mom.txt
USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html 

Query : select userid,url,uri_escape(uri) from USER_URLS;

Result :-
USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t   
USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf
                    
USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
USR10000|http://google.com/resource?key=value1 & value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html


Current releases of Hive lacks a function which would decode the encoded uri.
The function URI_UNESCAPE (uri) would return the decoded form  of the encoded URI which would
be useful while using HiveQL.This function converts the specified string by replacing any
escape sequences with their unescaped representation.

Functionality :-

Function Name: URI_UNESCAPE (uri)

Returns the decoded form of the encoded uri.
Example: hive> SELECT URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
-> 'http://www.example.com?a=l&t'

Usage :-

Case 1 : To get decoded uri corresponding to a particular encoded uri

hive> SELECT URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
-> 'http://google.com/resource?key=value1 & value2'

Case 2 : To query a table to get decoded form of the encoded urls corresponding to users
Table :- USER_URLS
userid |encodedurl

USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
USR10010|gopher%3A%2F%2Fgopher.voa.gov
USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html

Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;

Result :-
USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first
book.pdf
USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480
15sec h.264.mp4
USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1
& value2
USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html




    
> URI_Escape and URI_UnEscape UDF
> -------------------------------
>
>                 Key: HIVE-3906
>                 URL: https://issues.apache.org/jira/browse/HIVE-3906
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>    Affects Versions: 0.8.1
>         Environment: Hadoop 0.20.1
> Java 1.6.0
>            Reporter: Liu Zongquan
>              Labels: patch
>             Fix For: 0.8.1
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Current releases of Hive lacks a function which would encode URL or form parameters or
it escapes the URI.
> The function URI_ESCAPE (uri) would return the encoded form  of the URI which would be
useful while using HiveQL.Its always advisable to encode URL or form parameters; plain form
parameter is vulnerable to cross site attack, SQL injection and may direct our web application
into some unpredicted output.
> Functionality :-
> Function Name: URI_ESCAPE (uri)
> Returns the encoded form of the uri.
> Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
> -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'
> Usage :-
> Case 1 : To get encoded uri corresponding to a particular uri
> hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');
> -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'
> Case 2 : To query a table to get encoded form of the urls corresponding to users
> Table :- USER_URLS
> userid |url
> USR00001|http://www.example.com?a=l&t   
> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf                 
    
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
> USR01000|http://google.com/resource?key=value
> USR10000|http://google.com/resource?key=value1 & value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher://gopher.voa.gov
> USR10100|http://www.apple.com/index.html
> USR11000|file:/data/letters/to_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html 
> Query : select userid,url,uri_escape(uri) from USER_URLS;
> Result :-
> USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t  

> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf
                    
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http://google.com/resource?key=value1 & value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Current releases of Hive lacks a function which would decode the encoded uri.
> The function URI_UNESCAPE (uri) would return the decoded form  of the encoded URI which
would be useful while using HiveQL.This function converts the specified string by replacing
any escape sequences with their unescaped representation.
> Functionality :-
> Function Name: URI_UNESCAPE (uri)
> Returns the decoded form of the encoded uri.
> Example: hive> SELECT URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
> -> 'http://www.example.com?a=l&t'
> Usage :-
> Case 1 : To get decoded uri corresponding to a particular encoded uri
> hive> SELECT URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
> -> 'http://google.com/resource?key=value1 & value2'
> Case 2 : To query a table to get decoded form of the encoded urls corresponding to users
> Table :- USER_URLS
> userid |encodedurl
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;
> Result :-
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first
book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480
15sec h.264.mp4
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1
& value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message