hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-179) SUBSTR function should work like other databases
Date Sat, 10 Jan 2009 01:23:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662598#action_12662598
] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

This seems to be exposing a latent bug in plan generation. Still investigating this, but here
is an explanation. If you do 

diff -y ./ql/src/test/results/clientpositive/groupby2_map.q.out ./ql/src/test/results/clientpositive/.svn/text-base/groupby2_map.q.out.svn-base
| less 

you will see the plan changes that this patch makes. An excerpt is as follows: 
-------------------------------- 
STAGE PLANS: STAGE PLANS: 
  Stage: Stage-1 Stage: Stage-1 
    Map Reduce Map Reduce 
      Alias -> Map Operator Tree: Alias -> Map Operator Tree: 
        src src 
            Group By Operator Group By Operator 
              aggregations: aggregations: 
                    expr: sum(UDFToDouble(substr(value, 5))) | expr: count(DISTINCT substr(value,
4)) 
                    expr: count(DISTINCT substr(value, 5)) | expr: sum(UDFToDouble(substr(value,
4))) 
              keys: keys: 
                    expr: substr(key, 1, 1) | expr: substr(key, 0, 1) 
                    type: string type: string 
                    expr: substr(value, 5) | expr: substr(value, 4) 
                    type: string type: string 
              mode: hash mode: hash 
              Reduce Output Operator Reduce Output Operator 
                key expressions: key expressions: 
                      expr: 0 expr: 0 
                      type: string type: string 
                      expr: 1 expr: 1 
                      type: string type: string 
                sort order: ++ sort order: ++ 
                Map-reduce partition columns: Map-reduce partition columns: 
                      expr: 0 expr: 0 
                      type: string type: string 
                      expr: 1 expr: 1 
                      type: string type: string 
                tag: -1 tag: -1 
                value expressions: value expressions: 
                      expr: 2 expr: 2 
                      type: double < 
                      expr: 3 < 
                      type: bigint type: bigint 
                                                              > expr: 3 
                                                              > type: double 
      Reduce Operator Tree: Reduce Operator Tree: 
        Group By Operator Group By Operator 
          aggregations: aggregations: 
                expr: count(DISTINCT KEY.1) expr: count(DISTINCT KEY.1) 
                expr: sum(VALUE.0) | expr: sum(VALUE.1) 
          keys: keys: 
                expr: KEY.0 expr: KEY.0 
---------------------------- 
So the point to note here is that the count and the sum expressions are the top most group
by operator so 
in the last group by operator in the excerpt I think expr: count(DISTINCT KEY.1) should have
been expr: count(DISTINCT KEY0) 

Still debugging this one..

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count
backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings
are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for
negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message