hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Ren <h....@claravista.fr>
Subject Re: Hive-Hbase integration: join issues
Date Wed, 07 Aug 2013 08:36:53 GMT
Update:

When I remove *client_list Array<string>*in both tables, it works fine.

So, the problem is how to join a shark table and a hbase table with 
Array, Struct or Map ?

Any workaround here ?

Thank you.

Hao

Le 07/08/2013 10:09, Hao Ren a écrit :
> Hi,
>
> I have integrated hbase with Hive.
>
> When joining a shark table with a hbase table.
>
>
> It throws an exception:
>
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be 
> cast to [Ljava.lang.Object;
>     at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:434)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
>     at 
> shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:73)
>     at 
> shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:72)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>     at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
>     at 
> shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:72)
>     at 
> shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:133)
>     at 
> shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138)
>     at 
> shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138)
>     at spark.scheduler.ResultTask.run(ResultTask.scala:77)
>     at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:724)
>
> Here is my hbase_table:
>
> CREATE TABLE hbase_dict (
> idvisite string,
> client_list Array<string>,
> nb_client int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
> ":key,clients:id_list,clients:nb")
> TBLPROPERTIES(
> "hbase.table.name" = "cookie_clients_dict",
> "hbase.table.default.storage.type" = "binary")
> ;
>
> It seems a SerDe problem. I have tried binary and string storage type. 
> They dont work.
>
> The join query as below
>
>     SELECT * FROM  hive_dict n join hbase_dict o on (o.idvisite = 
> n.idvisite);
>
> where hive_dict is a native hive table.
>
> I am new to hive and hbase. Googled a lot, but nothing found.
>
> Any thought is highly appreciated.
>
> Thank you in advance.
>
> Hao.
>
>
>
>


-- 
Hao Ren
ClaraVista
www.claravista.fr


Mime
View raw message