chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Boulon (JIRA)" <>
Subject [jira] Commented: (CHUKWA-369) proposed reliability mechanism
Date Wed, 05 Aug 2009 22:33:14 GMT


Jerome Boulon commented on CHUKWA-369:

Regarding the issue with .chukwa files, the new LocalWriter is taking care of this. Any file
older than the rotation period +1min will be rename and send over to HDFS.

@Ari: there's one thing I don't understand. Since there's more than one client writing to
the same SeqFile, How do you know that the 2 additional MBs that you are seeing on the file
is comming from Client1 and not Client2? Also keep in mind that in order to improve performance,
most of the time you will have to buffer data in memory first then write in big chunk to disk.

This is what HDFS is doing and from what I know there's no easy way to figure out if the data
is still in memory or has been written to disk (At least for now).

So unless you are able to keep track of the last SeqID per RecordType/Agent at the collector
side and then figure out what has been push to disk and what is still in memory, I don't see
a way to send the right information back to the Agent.


> proposed reliability mechanism
> ------------------------------
>                 Key: CHUKWA-369
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>             Fix For: 0.3.0
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message