drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Pernsteiner <apernstei...@maprtech.com>
Subject Using Convert_from when JSON lives inside HBASE
Date Fri, 15 Aug 2014 19:52:39 GMT
Figured this might be useful.  I haven't tested much beyond what you see
below, but its a start..thanks to jinfeng for giving me the syntax


Here's a snip of the JSON data that I shoved into HBASE:


{"trans_id":20397,"date":"2014-04-30","time":"08:49:42","user_info":{"cust_id":5324,"device":"AOS4.3","state":"ca"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
{"trans_id":29615,"date":"2014-04-30","time":"13:33:16","user_info":{"cust_id":7790,"device":"IOS5","state":"ok"},"trans_info":{"prod_id":[24,3,0,1],"purch_flag":"true"}}
{"trans_id":21448,"date":"2014-04-30","time":"14:39:20","user_info":{"cust_id":8811,"device":"IOS5","state":"pa"},"trans_info":{"prod_id":[110,87],"purch_flag":"false"}}
{"trans_id":29972,"date":"2014-04-30","time":"12:42:54","user_info":{"cust_id":5229,"device":"IOS6","state":"il"},"trans_info":{"prod_id":[92,26],"purch_flag":"false"}}



I then used 'hbase shell' to insert an entire JSON record into a single
column in HBASE, eg:

put clicks, 29972, "blob:json",
"{"trans_id":29972,"date":"2014-04-30","time":"12:42:54","user_info":{"cust_id":5229,"device":"IOS6","state":"il"},"trans_info":{"prod_id":[92,26],"purch_flag":"false"}}"

(29972 was the rowkey in this case, 'blob' was the CF, and 'json' was the
column name)


Doing a simple 'select * from hbase.`clicks` limit 3'  yields:

+------------+------------+
|  row_key   |    blob    |
+------------+------------+
| [B@7735a933 |
{"json":"eyJ0cmFuc19pZCI6MzAwMDcsImRhdGUiOiIyMDE0LTA0LTExIiwidGltZSI6IjE3OjUyOjQ3IiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjE3NzU5LCJkZXZpY2UiOiJBT1M0LjMiLCJzdGF0ZSI6ImdhIn0sInRyYW5zX2luZm8iOnsicH
|
| [B@63f79069 |
{"json":"eyJ0cmFuc19pZCI6MzAwNTgsImRhdGUiOiIyMDE0LTA0LTIxIiwidGltZSI6IjEwOjU3OjMzIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjE2NzY4LCJkZXZpY2UiOiJBT1M0LjIiLCJzdGF0ZSI6ImZsIn0sInRyYW5zX2luZm8iOnsicH
|
| [B@122a10c5 |
{"json":"eyJ0cmFuc19pZCI6MzAwOTksImRhdGUiOiIyMDE0LTA0LTE5IiwidGltZSI6IjE5OjA0OjUzIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjE1MTgzLCJkZXZpY2UiOiJBT1M0LjMiLCJzdGF0ZSI6Im1uIn0sInRyYW5zX2luZm8iOnsicH
|
+------------+------------+



Now, to get inside, its a little tricky at first, but makes sense once you
do it.

First, yank the JSON out by casting:


0: jdbc:drill:> select cast (row_key as int) as mykey, cast (t.`blob`.json
as varchar(600)) as jblob from clicks t limit 3;
+------------+------------+
|   mykey    |   jblob    |
+------------+------------+
| 30007      |
{"trans_id":30007,"date":"2014-04-11","time":"17:52:47","user_info":{"cust_id":17759,"device":"AOS4.3","state":"ga"},"trans_info":{"prod_id":[116,240,1,377],"purch_flag":"false"}}
|
| 30058      |
{"trans_id":30058,"date":"2014-04-21","time":"10:57:33","user_info":{"cust_id":16768,"device":"AOS4.2","state":"fl"},"trans_info":{"prod_id":[507,0,352,546,5,292,35],"purch_flag":"false"}}
|
| 30099      |
{"trans_id":30099,"date":"2014-04-19","time":"19:04:53","user_info":{"cust_id":15183,"device":"AOS4.3","state":"mn"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
|
+------------+------------+
3 rows selected (0.196 seconds)



Note that there may be another method to cast (other than blindly guessing
at the length), but this worked for me.

Now that you can see the JSON, you need to use convert_from along with some
subscripting to get it out:



0: jdbc:drill:> select foo.mycol.trans_id, foo.mycol.user_info.cust_id from
(select convert_from(cast (t.`blob`.json as varchar(600)),'JSON') as mycol
from clicks t) as foo limit 3;
+------------+------------+
|   EXPR$0   |   EXPR$1   |
+------------+------------+
| 30007      | 17759      |
| 30058      | 16768      |
| 30099      | 15183      |
+------------+------------+
3 rows selected (0.281 seconds)


'foo' and 'mycol' are temporary table/column names I used, needed at each
stage to give a name to what we're pulling out.


Hope this helps, if you have issues, send me the JSON objects as well as
your column information and I can try to reproduce.



-- 
 Andy Pernsteiner
 Manager, Field Enablement
ph: 206.228.0737

www.mapr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message