chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-462) Store the cluster in the key for performance and easier customization on mappers
Date Thu, 11 Mar 2010 00:35:27 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843849#action_12843849
] 

Eric Yang commented on CHUKWA-462:
----------------------------------

+1 Looks good, and it'll speeds up demux.  The original record design was aiming for generalization
instead of speed.  In real use case, it's better to have the concept of grouping data by cluster.
 Hence, the cluster concept is already set in stone in Chukwa.  Hence, this performance improvement
is a reasonable trading off for "clusterName" to become a reserved keyword for Chukwa.

> Store the cluster in the key for performance and easier customization on mappers
> --------------------------------------------------------------------------------
>
>                 Key: CHUKWA-462
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-462
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: Data Processors
>            Reporter: Guille -bisho-
>         Attachments: cluster_in_ChukwaRecordKey.v3.diff
>
>
> Right now the chukwa framework is storing the destination cluster as a tag in the Chunk.
Then the tags are copied to the ChukwaRecord, and before storing it, it's parsed with a regular
expression from each record.
> - It's slow to apply a preg to each record
> - It's harder to modify the destination cluster from the mapper, you have to tweak the
tags field.
> - Takes unneeded space on records storing the cluster on each of them.
> The proposed path:
> - Extracts the cluster from chunk tags just once per chunk, much faster.
> - Stores the cluster in the key, so it's easy to recover.
> - It's easy to tweak from the mapper. Just alter it with key.setClusterName(String clusterName)
> - Strips the cluster from the tags field of the resulting chukwa records. If the tags
field is empty, completely skips setting the tags field in the record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message