hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-857) Transform should produce objects in the same type as specified at runtime
Date Fri, 25 Sep 2009 21:05:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759744#action_12759744
] 

Zheng Shao commented on HIVE-857:
---------------------------------

Explain and stack traces. Thanks for Andrew Hitchcock for reporting this:

{code}
EXPLAIN INSERT OVERWRITE TABLE feature_index
SELECT
 temp.feature,
 temp.ad_id,
 sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
FROM (
 SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
 FROM (
  MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
  USING 's3://anhi-test-programs/hive/split_user_agent.py' as (feature STRING, ad_id STRING,
clicked BOOLEAN)
  FROM raw_logs
 ) ua
) temp
GROUP BY temp.feature, temp.ad_id;

OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM
(TOK_TABREF raw_logs)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
(TOK_TRANSFORM (TOK_EXPLIST (. (TOK_TABLE_OR_COL raw_logs) user_agent) (. (TOK_TABLE_OR_COL
raw_logs) ad_id) (. (TOK_TABLE_OR_COL raw_logs) clicked)) 's3://anhi-test-programs/hive/split_user_agent.py'
(TOK_TABCOLLIST (TOK_TABCOL feature TOK_STRING) (TOK_TABCOL ad_id TOK_STRING) (TOK_TABCOL
clicked TOK_BOOLEAN))))))) ua)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
(TOK_SELEXPR (TOK_FUNCTION concat 'ua:' (TOK_FUNCTION trim (TOK_FUNCTION lower (. (TOK_TABLE_OR_COL
ua) feature)))) feature) (TOK_SELEXPR (. (TOK_TABLE_OR_COL ua) ad_id)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL
ua) clicked))))) temp)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB feature_index)) (TOK_SELECT
(TOK_SELEXPR (. (TOK_TABLE_OR_COL temp) feature)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL temp)
ad_id)) (TOK_SELEXPR (/ (TOK_FUNCTION sum (TOK_FUNCTION if (. (TOK_TABLE_OR_COL temp) clicked)
1 0)) (TOK_FUNCTION TOK_DOUBLE (TOK_FUNCTION count (. (TOK_TABLE_OR_COL temp) clicked))))
clicked_percent)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL temp) feature) (. (TOK_TABLE_OR_COL temp)
ad_id))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        temp:ua:raw_logs 
          TableScan
            alias: raw_logs
            Select Operator
              expressions:
                    expr: user_agent
                    type: string
                    expr: ad_id
                    type: string
                    expr: clicked
                    type: boolean
              outputColumnNames: _col0, _col1, _col2
              Transform Operator
                command: s3://anhi-test-programs/hive/split_user_agent.py
                output info:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                Select Operator
                  expressions:
                        expr: concat('ua:', trim(lower(feature)))
                        type: string
                        expr: ad_id
                        type: string
                        expr: clicked
                        type: boolean
                  outputColumnNames: _col0, _col1, _col2
                  Select Operator
                    expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                          expr: _col2
                          type: boolean
                    outputColumnNames: _col0, _col1, _col2
                    Group By Operator
                      aggregations:
                            expr: sum(if(_col2, 1, 0))
                            expr: count(_col2)
                      keys:
                            expr: _col0
                            type: string
                            expr: _col1
                            type: string
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Reduce Output Operator
                        key expressions:
                              expr: _col0
                              type: string
                              expr: _col1
                              type: string
                        sort order: ++
                        Map-reduce partition columns:
                              expr: _col0
                              type: string
                              expr: _col1
                              type: string
                        tag: -1
                        value expressions:
                              expr: _col2
                              type: bigint
                              expr: _col3
                              type: bigint
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE._col0)
                expr: count(VALUE._col1)
          keys:
                expr: KEY._col0
                type: string
                expr: KEY._col1
                type: string
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3
          Select Operator
            expressions:
                  expr: _col0
                  type: string
                  expr: _col1
                  type: string
                  expr: (UDFToDouble(_col2) / UDFToDouble(_col3))
                  type: double
            outputColumnNames: _col0, _col1, _col2
            File Output Operator
              compressed: false
              GlobalTableId: 1
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: feature_index

  Stage: Stage-0
    Move Operator
      tables:
          replace: true
          table:
              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: feature_index


And the Hadoop stack trace:

java.lang.RuntimeException: Map operator initialization failed                           
                                                                               
        at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:110)      
                                                                               
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)       
                                                                               
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)   
                                                                               
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33)               
                                                                               
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)       
                                                                               
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)   
                                                                               
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:224)                        
                                                                               
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2204)        
                                                                               
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
                                                                           
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.initializeOp(ScriptOperator.java:266)
                                                                          
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
                                                                           
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:303)       
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:301) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:82)       
                                                                               
        ... 7 more                                                                       
                                                                               
Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The first argument of
function IF should be "boolean", but "string"                                  
 is found                                                                                
                                                                               
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.initialize(GenericUDFIf.java:58)
                                                                          
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:77)
                                                 
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:179)
                                                                        
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
                                                                           
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
                                                                           
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)         
                                                                               
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317) 
                                                                               
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.initializeOp(ScriptOperator.java:262)
                                                                          
        ... 19 more   
{code}


> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
>                 Key: HIVE-857
>                 URL: https://issues.apache.org/jira/browse/HIVE-857
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.1, 0.5.0
>            Reporter: Zheng Shao
>
> The following code fails at runtime, because the "clicked" field produced by "transform"
is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
>  temp.feature,
>  temp.ad_id,
>  sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
>  SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
>  FROM (
>   MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
>   USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
>   FROM raw_logs
>  ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message