hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How to insert Json records from Flume into Hbase with Kafka source
Date Sun, 16 Feb 2020 20:26:06 GMT
Hi,

This regex seems to work


*JsonAgent.sinks.Hbase-sink.serializer.regex
=[^_]*"(.+).{1},(.+),(.+),(.+).{1}*
Remember we were getting the below as ROW (incorrect) beforehand

{"rowkey":"eff0bdc7-d6b1-40b5-ad0a-b8181173b806"

The first positional column is the ROW_KEY. *We need to strip all except
the UUID itself*

[^_]*"(.+).{1} means

Get rid of everything *from start until and including first quote* and
also *get
rid of last quote *just getting the ROW_KEY itself

eff0bdc7-d6b1-40b5-ad0a-b8181173b806

And also we wanted to *get rid of '}' *from last column in this case the
price column

(.+).{1}

Means get rid of last character

Now the search via ROW_KEY works

hbase(main):483:0> *get 'trading:MARKETDATAHBASEBATCH',
'19735b2e-91b6-4cc8-afcb-f02c00bd52a3'*
COLUMN                                                         CELL
 PRICE_INFO:key
timestamp=1581883743642, value=19735b2e-91b6-4cc8-afcb-f02c00bd52a3
 PRICE_INFO:partition
timestamp=1581883743642, value=6
 PRICE_INFO:price
timestamp=1581883743642, value= "price":108.7
 PRICE_INFO:ticker
timestamp=1581883743642, value="ticker":"IBM"
 PRICE_INFO:timeissued
timestamp=1581883743642, value= "timeissued":"2020-02-16T20:19:43"
 PRICE_INFO:timestamp
timestamp=1581883743642, value=1581883739646
 PRICE_INFO:topic
timestamp=1581883743642, value=md
7 row(s) in 0.0040 seconds


Hope this helps

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 16 Feb 2020 at 10:47, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> BTW
>
> When I turn out headers in the conf fle
>
> JsonAgent.sinks.Hbase-sink.serializer.depositHeaders=true
>
> I get
>
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> *column=PRICE_INFO:key*, timestamp=1581849565330,
> *value=f8a6e006-35bb-4470-9a7b-9273b8aa83f*1
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:partition, timestamp=1581849565330, value=5
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:price, timestamp=1581849565330, value= "price":202.74}
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:ticker, timestamp=1581849565330, value="ticker":"IBM"
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:timeissued, timestamp=1581849565330, value=
> "timeissued":"2020-02-16T10:50:05"
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:timestamp, timestamp=1581849565330, value=1581849561330
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:topic, timestamp=1581849565330, value=md
>
> So it displays the key alright value=f8a6e006-35bb-4470-9a7b-9273b8aa83f1
>
> But cannot search on that key!
>
> hbase(main):333:0> get 'trading:MARKETDATAHBASEBATCH',
> 'f8a6e006-35bb-4470-9a7b-9273b8aa83f1'
> COLUMN                                                         CELL
> 0 row(s) in 0.0540 seconds
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 15 Feb 2020 at 15:12, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have streaming Kafka that sends data to flume in the following JSON
>> format
>>
>> This is the record is sent via Kafka
>>
>> 7d645a0f-0386-4405-8af1-7fca908fe928
>> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
>> "timeissued":"2020-02-14T20:32:29", "price":140.11}
>>
>> Note that "7d645a0f-0386-4405-8af1-7fca908fe928" is the key and there are
>> 4 columns in value including the key itself as another column.
>>
>> The Flume configuration file is as follows
>>
>> # Describing/Configuring the sink
>> JsonAgent.channels.hdfs-channel-1.type = memory
>> JsonAgent.channels.hdfs-channel-1.capacity = 300
>> JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
>> *JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
>> JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
>> JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
>> JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
>>
>> JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
>> *JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*
>>
>> *JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
>> 0JsonAgent.sinks.Hbase-sink.serializer.colNames
>> =ROW_KEY,ticker,timeissued,price*
>> JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
>> JsonAgent.sinks.Hbase-sink.batchSize =100
>>
>> This works and posts records to Hbase as follows:
>>
>> ROW
>> COLUMN+CELL
>>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
>> column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
>>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
>> column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
>>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
>> column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
>> "timeissued":"2020-02-14T20:32:29"
>> 1 row(s) in 0.0050 seconds
>>
>> However there is a problem. the rowkey value includes redundant
>> characters {"rowkey": that do not allow for records to be searched in Hbase
>> based on rowkey value! When I try to ignore the redundant characters by
>> twicking regex, unfortunately no rows are added to Hbase table. Example as
>> follows:
>>
>> JsonAgent.sinks.Hbase-sink.serializer.regex = (?<=^.{9}).+,(.+),(.+),(.+)
>>
>> Appreciate any advice.
>>
>> Thanks,
>>
>> Mich
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message