chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guille -bisho- (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CHUKWA-462) Store the cluster in the key for performance and easier customization on mappers
Date Thu, 11 Mar 2010 16:30:27 GMT

     [ https://issues.apache.org/jira/browse/CHUKWA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Guille -bisho- updated CHUKWA-462:
----------------------------------

    Attachment: cluster_in_ChukwaRecordKey.v4.diff

Adds setClusterName() to mappers that doesn't use the AbstractMapper helper, plus fix in regexp.

> Store the cluster in the key for performance and easier customization on mappers
> --------------------------------------------------------------------------------
>
>                 Key: CHUKWA-462
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-462
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: Data Processors
>            Reporter: Guille -bisho-
>         Attachments: cluster_in_ChukwaRecordKey.v3.diff, cluster_in_ChukwaRecordKey.v4.diff
>
>
> Right now the chukwa framework is storing the destination cluster as a tag in the Chunk.
Then the tags are copied to the ChukwaRecord, and before storing it, it's parsed with a regular
expression from each record.
> - It's slow to apply a preg to each record
> - It's harder to modify the destination cluster from the mapper, you have to tweak the
tags field.
> - Takes unneeded space on records storing the cluster on each of them.
> The proposed path:
> - Extracts the cluster from chunk tags just once per chunk, much faster.
> - Stores the cluster in the key, so it's easy to recover.
> - It's easy to tweak from the mapper. Just alter it with key.setClusterName(String clusterName)
> - Strips the cluster from the tags field of the resulting chukwa records. If the tags
field is empty, completely skips setting the tags field in the record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message