hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1257) joins between HBase tables and other tables (whether HBase or not) are broken
Date Thu, 18 Mar 2010 22:58:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847155#action_12847155
] 

John Sichi commented on HIVE-1257:
----------------------------------

Copying details from mailing list thread:

>From  	"Daniel Pruckler" <Daniel.Pruck...@t8webware.com>
Subject 	Problem when joining hbase tables in hive
Date 	Thu, 18 Mar 2010 16:50:22 GMT

We're just starting with the HBaseIntegration, and are really excited about it, great work
on this.  Thanks

 

We are running into a problem when joining two hbase tables.  It seems to usually work when
joining one hbase table with one hive table, but not always.  Let me know if I'm doing something
wrong.

 

My workaround for right now is to create new tables in hive and overwrite that table with
the external hbase table, and then join from the new tables.  In our situation this is working
as the one large table we've got seems to work just fine in hbase, we just need to overwrite
the smaller join tables before doing a join.  I have one join example at the end of this email
that fails when joining an hbase and hive table.

 

 

 

 

Here's what I did:

 

Create the HBase tables:

 

create 'users', 'info'

put 'users', 'user1', 'info:state', 'IA'

put 'users', 'user1', 'info:country', 'USA'

put 'users', 'user1', 'info:country_id', '0'

 

create 'states', 'state'

put 'states', 'IA', 'state:name', 'Iowa'

 

create 'countries', 'info'

put 'countries', 'USA', 'info:name', 'United States'

put 'countries', 'USA', 'info:country', 'USA'

put 'countries', 'USA', 'info:country_id', '1'

 

 

 

Create the hive tables:

 

CREATE EXTERNAL TABLE users(key string, state string, country string, country_id int) 

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

"hbase.key.type" = "string",

"hbase.columns.mapping" = "info:state,info:country,info:country_id",

"hbase.table.name" = "users"

);

 

CREATE EXTERNAL TABLE states(key string, name string) 

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

"hbase.key.type" = "string",

"hbase.columns.mapping" = "state:name",

"hbase.table.name" = "states"

);

 

CREATE EXTERNAL TABLE countries(key string, name string, country string, country_id int) 

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

"hbase.key.type" = "string",

"hbase.columns.mapping" = "info:name,info:country,info:country_id",

"hbase.table.name" = "countries"

);

 

 

Then try to join, this returns no results:

SELECT u.key, u.country, c.name, c.key FROM users u JOIN countries c ON (u.country = c.key);

OK

Time taken: 26.75 seconds

 

 

This returns results, but not what's expected, it seems like it joins the user table with
itself

SELECT u.key, u.country, c.name, c.key FROM users u JOIN countries c ON (u.country = c.country);

OK

user1     USA        NULL      user1

Time taken: 24.867 seconds

 

 

This also returns results (it should not as the country_ids do not match)

SELECT u.key, u.country, c.name, c.key FROM users u JOIN countries c ON (u.country_id = c.country_id);

OK

user1     USA        NULL      user1

Time taken: 22.868 seconds

 

 

 

 

I would guess this is part of the same issue, trying to join anything in state with user results
in an error (state does not have an 'info' column family)

hive> SELECT u.key, u.state, s.name FROM users u JOIN states s ON (u.state = s.key);

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>

Starting Job = job_201003150937_0096, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201003150937_0096

Kill Command = /usr/lib/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001
-kill job_201003150937_0096

2010-03-18 10:43:28,730 Stage-1 map = 0%,  reduce = 0%

2010-03-18 10:43:34,908 Stage-1 map = 50%,  reduce = 0%

2010-03-18 10:43:44,020 Stage-1 map = 50%,  reduce = 17%

2010-03-18 10:44:02,146 Stage-1 map = 100%,  reduce = 100%

Ended Job = job_201003150937_0096 with errors

 

Task with the most failures(4): 

-----

Task ID:

  task_201003150937_0096_m_000001

 

URL:

  http://localhost:50030/taskdetails.jsp?jobid=job_201003150937_0096&tipid=task_201003150937_0096_m_000001

-----

 

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

 

org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:
Column family state does not exist in region users,,1268926579188 in table {NAME => 'users',
FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}

                at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2476)

                at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1107)

                at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1095)

                at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1846)

                at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)

                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 

                at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

                at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

                at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)

                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1041)

                at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1935)

                at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1855)

                at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:376)

                at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:129)

                at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:139)

                at org.apache.hadoop.hbase.mapred.TableInputFormatBase.getRecordReader(TableInputFormatBase.java:260)

                at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:119)

                at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:229)

                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

                at org.apache.hadoop.mapred.Child.main(Child.java:170)

 

 

 

Here's my workaround that allows me to be able to join.

 

CREATE TABLE hive_users(key string, state string, country string, country_id int);

INSERT OVERWRITE TABLE hive_users SELECT key, state, country, country_id FROM users;

 

CREATE TABLE hive_states(key string, name string);

INSERT OVERWRITE TABLE hive_states SELECT key, name FROM states;

 

CREATE TABLE hive_countries(key string, name string, country string, country_id int);

INSERT OVERWRITE TABLE hive_countries SELECT key, name, country, country_id FROM countries;

 

 

Any of my selects work just fine:

SELECT u.key, u.state, s.name FROM hive_users u JOIN hive_states s ON (u.state = s.key);

SELECT u.key, u.country, c.name FROM hive_users u JOIN hive_countries c ON (u.country = c.key);

 

 

 

When joining one hbase table it will work with some tables, and give errors with others, I'm
not seeing anything I'm doing wrong with the select.

 

This works:

SELECT u.key, u.country, c.name FROM users u JOIN hive_countries c ON (u.country = c.key);

 

However while trying other things, I found that this query does not work, and cannot figure
out why it'd be any different, maybe I'm doing something wrong.

SELECT u.key, u.country, c.name FROM hive_users u JOIN countries c ON (u.country = c.key);

 

Ended Job = job_201003150937_0118 with errors

 

Task with the most failures(4): 

-----

Task ID:

  task_201003150937_0118_m_000001

 

URL:

  http://localhost:50030/taskdetails.jsp?jobid=job_201003150937_0118&tipid=task_201003150937_0118_m_000001

-----

 

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

 

 

java.lang.IllegalArgumentException: string cannot be null

                at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:390)

                at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:68)

                at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:229)

                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

                at org.apache.hadoop.mapred.Child.main(Child.java:170)

 

 

 

 

 

Thanks,

Daniel

 

 




> joins between HBase tables and other tables (whether HBase or not) are broken
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-1257
>                 URL: https://issues.apache.org/jira/browse/HIVE-1257
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>
> Details in
> http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201003.mbox/%3C9A53DDE1FE082F4D952FDF20AC87E21F021F3EBC@exchange2.t8design.com%3E

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message