chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-564) HBase output collector uses incorrect column family
Date Fri, 10 Dec 2010 23:59:00 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970345#action_12970345
] 

Bill Graham commented on CHUKWA-564:
------------------------------------

I agree that there are limitations in using annotations on the processors. I think that where
the data is written should be decoupled from the processors. A processor knows how to process
data, but it shouldn't also state where the data should be written. Generic processors like
TsProcessors could be used repeatedly for different data types, all of which should be written
to different table/column-families. Coupling the two with annotations makes this difficult.
You end up with empty subclasses used only to configure different data types to table/cfs
via overridden annotations.

I suggest we externalize the table/cf mappings from the processors. Instead we could have
something like an HBaseRouterFactory (or something perhaps named better) that the OutputCollector
and the HBaseWriter interact with. HBaseRouterFactory has a method that takes in a dataType
and probably also a ChukwaRecord and knows how to return the Table and ColumnFamily that the
data should be written too. 

We could then configure that dataType 'foo' should use BarProcessor and write to table 'bat',
column family 'biz'.

I don't know how we'd configure 'foo's payload to be written to multiple cfs though. What's
the use case for why we'd want to write the same data to two locations?

There's still an unresolved separate problem of how to handle ORM-ish functionality as well,
since reduxing the many parameters in the record body back to a single 'body' field can be
sub-optimal.

> HBase output collector uses incorrect column family
> ---------------------------------------------------
>
>                 Key: CHUKWA-564
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-564
>             Project: Chukwa
>          Issue Type: Bug
>            Reporter: Bill Graham
>             Fix For: 0.5.0
>
>
> The HBase {{OutputCollector}} does this to obtain the column family from the data type:
> {noformat}
> cf = key.getReduceType().getBytes();
> {noformat}
> The column family should instead be taken by the {{@Table.columnFamily}} annotation on
the processor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message