flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Natkins <na...@streamsets.com>
Subject Re: Flume to Hbase columns with regexp
Date Mon, 28 Jul 2014 23:15:06 GMT
Alright, a couple things:

1) It looks like my intuition was correct. Changing your config to be
colNames from columns seems to get things working.

2) Based on the description of what you're trying to do, it looks like your
regex might be slightly off. For example, if I had a row:


Your regex will result in column1 containing 'familyName', and column2
containing 'col1val,col2val', which I don't think is what you're trying to
do. Probably you want to use this regex, or something like it:


This regex will result in column1 containing 'col1val', column2 containing
'col2val', and the first value (which appears to be the family name) being
thrown away. Is this what you were trying to do?

As an aside, the mechanics of the RegexHbaseEventSerializer are to take the
matching groups and map those to the list of column names defined by the
colNames config parameter. If you want to toss any data away, just make
sure it's not within a set of parentheses.

Let me know if you have any more questions, or if you have trouble getting
this to work.


On Mon, Jul 28, 2014 at 3:48 PM, Jonathan Natkins <natty@streamsets.com>

> I haven't tested this myself, but a quick look at the code suggests that
> your column name specification may be configured incorrectly. It looks like
> it should be:
> agent.sinks.hbaseSink.serializer.colNames = column1,column2
> I'm trying this out myself, though, so if I find something definitive,
> I'll let you know.
> On Mon, Jul 28, 2014 at 4:19 AM, Tinte garcia, Miguel Angel <
> miguel.tinte@atos.net> wrote:
>>  Hi,
>> I am sending a Flume event to insert some information into a concrete
>> HBase table. My flume conf.properties looks like this:
>> agent.sinks.hbaseSink.table=table_name
>> agent.sinks.hbaseSink.columnFamily=idColumn
>> agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
>> agent.sinks.hbaseSink.serializer.regex=^([^,]+),(.+)$
>> agent.sinks.hbaseSink.serializer.columns = column1,column2
>> Basically, what I am trying to do is splitting the input values into
>> three different columns:  idColumn,column1,column2
>> With this configuration, no error is returned but no input is recorded
>> into the table. Any idea about what am I doing wrong?
>> Thanks in advance

View raw message