hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Ferguson <j...@besquared.net>
Subject Re: Index Expressions
Date Wed, 03 Dec 2008 03:44:25 GMT
These tables were already defined with those

CREATE TABLE activities
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,  
hour INT)
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime: 
1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:actor_id,type:string,comment:null),  
FieldSchema(name:actee_id,type:string,comment:null),  
FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/user/ 
hive/warehouse/ 
activities 
,inputFormat:org 
.apache 
.hadoop 
.mapred 
.TextInputFormat 
,outputFormat:org 
.apache 
.hadoop 
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
32 
,serdeInfo:SerDeInfo 
(name:null 
,serializationLib:org 
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
{colelction 
.delim 
= 
44 
,mapkey 
.delim 
= 
58 
,serialization 
.format 
= 
org 
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys: 
[FieldSchema(name:account,type:string,comment:null),  
FieldSchema(name:application,type:string,comment:null),  
FieldSchema(name:dataset,type:string,comment:null),  
FieldSchema(name:hour,type:int,comment:null)],parameters:{})


CREATE TABLE users
(id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,  
hour INT)
CLUSTERED BY (id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime: 
1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:id,type:string,comment:null),  
FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/user/ 
hive/warehouse/ 
users 
,inputFormat:org 
.apache 
.hadoop 
.mapred 
.TextInputFormat 
,outputFormat:org 
.apache 
.hadoop 
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
32 
,serdeInfo:SerDeInfo 
(name:null 
,serializationLib:org 
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
{colelction 
.delim 
= 
44 
,mapkey 
.delim 
= 
58 
,serialization 
.format 
= 
org 
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
[id],sortCols:[],parameters:{}),partitionKeys: 
[FieldSchema(name:account,type:string,comment:null),  
FieldSchema(name:application,type:string,comment:null),  
FieldSchema(name:dataset,type:string,comment:null),  
FieldSchema(name:hour,type:int,comment:null)],parameters:{})

The Query Was:

> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
> AND activities.properties['verb'] = 'Dance';


Josh Ferguson

On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:

> Hi Josh,
>
> This is a known problem.
>
> If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to  
> specify the ITEM terminator and KEY terminator. Please see the  
> complete CREATE table command syntax.
>
> Let me know if it works or not.
>
> Zheng
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, December 02, 2008 1:36 AM
> To: hive-user@hadoop.apache.org
> Subject: Index Expressions
>
> I'm getting a bunch of errors during my reduce step while doing a  
> join between two string fields for some reason.
>
> java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate  
> index expression on string
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java: 
> 64)
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>           at  
> org 
> .apache 
> .hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
>           at  
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
>           at  
> org 
> .apache 
> .hadoop 
> .hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java: 
> 257)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 477)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 467)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 467)
>           at  
> org 
> .apache 
> .hadoop 
> .hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java: 
> 489)
>           at  
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java: 
> 140)
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 
> 430)
>           at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
> The Query is
> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
> AND activities.properties['verb'] = 'Dance';
> hive> DESCRIBE activities;
> OK
> actor_id        string
> actee_id        string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> hive> DESCRIBE users;
> OK
> id      string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> Thanks
>
> Josh
>


Mime
View raw message