hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How to insert Json records from Flume into Hbase with Kafka source
Date Sun, 16 Feb 2020 10:47:07 GMT
BTW

When I turn out headers in the conf fle

JsonAgent.sinks.Hbase-sink.serializer.depositHeaders=true

I get

 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
*column=PRICE_INFO:key*, timestamp=1581849565330,
*value=f8a6e006-35bb-4470-9a7b-9273b8aa83f*1
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:partition, timestamp=1581849565330, value=5
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:price, timestamp=1581849565330, value= "price":202.74}
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:ticker, timestamp=1581849565330, value="ticker":"IBM"
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:timeissued, timestamp=1581849565330, value=
"timeissued":"2020-02-16T10:50:05"
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:timestamp, timestamp=1581849565330, value=1581849561330
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:topic, timestamp=1581849565330, value=md

So it displays the key alright value=f8a6e006-35bb-4470-9a7b-9273b8aa83f1

But cannot search on that key!

hbase(main):333:0> get 'trading:MARKETDATAHBASEBATCH',
'f8a6e006-35bb-4470-9a7b-9273b8aa83f1'
COLUMN                                                         CELL
0 row(s) in 0.0540 seconds






Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 15 Feb 2020 at 15:12, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
> I have streaming Kafka that sends data to flume in the following JSON
> format
>
> This is the record is sent via Kafka
>
> 7d645a0f-0386-4405-8af1-7fca908fe928
> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
> "timeissued":"2020-02-14T20:32:29", "price":140.11}
>
> Note that "7d645a0f-0386-4405-8af1-7fca908fe928" is the key and there are
> 4 columns in value including the key itself as another column.
>
> The Flume configuration file is as follows
>
> # Describing/Configuring the sink
> JsonAgent.channels.hdfs-channel-1.type = memory
> JsonAgent.channels.hdfs-channel-1.capacity = 300
> JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
> *JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
> JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
> JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
> JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
>
> JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
> *JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*
>
> *JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
> 0JsonAgent.sinks.Hbase-sink.serializer.colNames
> =ROW_KEY,ticker,timeissued,price*
> JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
> JsonAgent.sinks.Hbase-sink.batchSize =100
>
> This works and posts records to Hbase as follows:
>
> ROW                                                            COLUMN+CELL
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
> "timeissued":"2020-02-14T20:32:29"
> 1 row(s) in 0.0050 seconds
>
> However there is a problem. the rowkey value includes redundant
> characters {"rowkey": that do not allow for records to be searched in Hbase
> based on rowkey value! When I try to ignore the redundant characters by
> twicking regex, unfortunately no rows are added to Hbase table. Example as
> follows:
>
> JsonAgent.sinks.Hbase-sink.serializer.regex = (?<=^.{9}).+,(.+),(.+),(.+)
>
> Appreciate any advice.
>
> Thanks,
>
> Mich
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message