hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How to insert Json records from Flume into Hbase table
Date Fri, 14 Feb 2020 20:24:45 GMT
Thanks Pedro. The mention of
*org.apache.flume.sink.hbase.RegexHbaseEventSerializer* was very useful.

This works

# Describing/Configuring the sink
JsonAgent.channels.hdfs-channel-1.type = memory
JsonAgent.channels.hdfs-channel-1.capacity = 300
JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
*JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
*JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*

*JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
0JsonAgent.sinks.Hbase-sink.serializer.colNames
=ROW_KEY,ticker,timeissued,price*
JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
JsonAgent.sinks.Hbase-sink.batchSize =100

This is the record is sent via Kafka

7d645a0f-0386-4405-8af1-7fca908fe928
{"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
"timeissued":"2020-02-14T20:32:29", "price":140.11}

And the same record in Hbase

 ROW                                                            COLUMN+CELL
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
"timeissued":"2020-02-14T20:32:29"
1 row(s) in 0.0050 seconds

Regards,

Mich






On Fri, 14 Feb 2020 at 13:41, Pedro Boado <pedro.boado@gmail.com> wrote:

> Probably Flume's mailing list would be a better resource to get help about
> this.
>
> SimpleHBaseEventSerializer doesn't do regex, so you can't extract your own
> .
>
> https://github.com/slmnhq/flume/blob/master/flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/SimpleHbaseEventSerializer.java#L40
>
> I'd say you should go for RegexHbaseEventRowKeySerializer.
>
>
>
> On Fri, 14 Feb 2020 at 13:27, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
> > Thanks Pedro,
> >
> > As I understand it tries a default rowkey as follows:
> >
> > Row keys are default + UUID_like_string
> > :
> >  defaultfb7cb953-8598-466e-a1c0-277e2863b249
> >
> > But I send rowkey value as well
> >
> > *f2d7174e-6299-49a7-9e87-0d66c248e66b*
> > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> >
> > But it is still generates its own rowkey. -->
> > defaultfb7cb953-8598-466e-a1c0-277e2863b249
> >
> > How can I make Hbase use the rowkey that flume sends WITHOUT generating
> its
> > own rowkey?
> >
> > Regards,
> >
> > Mich
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Fri, 14 Feb 2020 at 12:27, Pedro Boado <pedro.boado@gmail.com> wrote:
> >
> > > If what you're looking after is not achievable by extracting fields
> > through
> > > regex (it looks like it should) and you are after full control over
> > what's
> > > written to HBase you're probably looking at writing your own
> serializer.
> > >
> > > On Fri, 14 Feb 2020 at 11:05, Mich Talebzadeh <
> mich.talebzadeh@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have an Hbase table 'trading:MARKETDATAHBASEBATCH'
> > > >
> > > > Kafka delivers topic rows into flume.
> > > >
> > > > This is a typical json row
> > > >
> > > > f2d7174e-6299-49a7-9e87-0d66c248e66b
> > > > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > > > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> > > >
> > > > The rowkey is UUID
> > > >
> > > > The json.conf for Flume is as follows:
> > > >
> > > > # Describing/Configuring the sink
> > > > JsonAgent.channels.hdfs-channel-1.type = memory
> > > > JsonAgent.channels.hdfs-channel-1.capacity = 300
> > > > JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
> > > > JsonAgent.sinks.Hbase-sink.type =
> org.apache.flume.sink.hbase.HBaseSink
> > > > JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
> > > > JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
> > > > JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
> > > > JsonAgent.sinks.Hbase-sink.serializer
> > > > =org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
> > > > ##JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)
> > > > agent1.sinks.sink1.serializer.regex
> > > > =[a-zA-Z0-9]*^C[a-zA-Z0-9]*^C[a-zA-Z0-9]*
> > > > JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  = ROW_KEY
> > > > JsonAgent.sinks.Hbase-sink.serializer.colNames
> > > > =ROW_KEY,ticker,timeissued,price
> > > > JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
> > > > JsonAgent.sinks.Hbase-sink.batchSize =100
> > > >
> > > > The problem is that the rows are inserted as follows
> > > >
> > > > defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1
> > > > column=PRICE_INFO:pCol, timestamp=1581670394182,
> > > >
> value={"rowkey":"a7464cf4-42a1-40b8-a597-a41fbc3b847f","ticker":"MRW",
> > > > "timeissued":"2020-02-14T09:03:46", "price":317.13}
> > > >
> > > > So it creates a default rowkey value
> > > > "defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1" followed by json values
> > all
> > > > in value column
> > > >
> > > > Ideally I would like something similar to below:
> > > >
> > > > hbase(main):085:0> put 'trading:MARKETDATAHBASEBATCH',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:rowkey',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > > > hbase(main):086:0> put 'trading:MARKETDATAHBASEBATCH',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:ticker', "ORCL"
> > > > hbase(main):087:0> put 'trading:MARKETDATAHBASEBATCH',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:timeissued',
> > > > "2020-02-14T09:57:32"
> > > > hbase(main):001:0> put 'trading:MARKETDATAHBASEBATCH',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:price' ,22.12
> > > > hbase(main):002:0> get 'trading:MARKETDATAHBASEBATCH',
> > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > > > COLUMN                                                         CELL
> > > >  PRICE_INFO:price
> > > > timestamp=1581676221846, value=22.12
> > > >  PRICE_INFO:rowkey
> > > > timestamp=1581675986932, value=8b97d3b9-e87b-4f21-9879-b43c4dcccb37
> > > >  PRICE_INFO:ticker
> > > > timestamp=1581676103443, value=ORCL
> > > >  PRICE_INFO:timeissued
> > > > timestamp=1581676168656, value=2020-02-14T09:57:32
> > > >
> > > > Any advice would be appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Mich
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > >
> > >
> > > --
> > > Un saludo.
> > > Pedro Boado.
> > >
> >
>
>
> --
> Un saludo.
> Pedro Boado.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message