chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <>
Subject [jira] [Updated] (CHUKWA-744) Refactor ETL process for HBaseWriter
Date Sat, 18 Apr 2015 18:58:59 GMT


Eric Yang updated CHUKWA-744:
    Attachment: CHUKWA-744.patch

Resumit patch for review.

> Refactor ETL process for HBaseWriter
> ------------------------------------
>                 Key: CHUKWA-744
>                 URL:
>             Project: Chukwa
>          Issue Type: Task
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>         Attachments: CHUKWA-744.patch
> The current ETL classes are based on Demux MapProcessor and ReduceProcessor.  The processors
were designed to pass in archive key embedded in the processor as well as ChunkSaver to preserve
chunks that can not be parsed.  This is fine when running map reduce based demux job for processing
data.  The short lived task will spill out ChunkSaver into separate file for examination later.
 However, the processors can generate memory leaks for long period of time in Chukwa agent
because Chunks are saved in ChukwaSaver without clean up.
> It would be better to redesign the parser classes with well defined behavior.  If the
chunk can not be parsed, it should throw ParseException to upper layer for retry or log to
agent log.

This message was sent by Atlassian JIRA

View raw message