chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiaqi Tan <>
Subject Re: units in MDL and HICC
Date Fri, 22 May 2009 05:27:28 GMT
Hi Ari,

I think the real problem here is that sar metrics are being picked up
by an Exec adaptor which calls sar and there's no control over which
sar gets called (or at least not right now), and sar is ultimately an
external dependency which currently is just assumed to be sitting

Also, sar just directly emits unstructured plain text, so there's no
self-describing data format a la some XML which says what the units
are, so if sar is changing output units and stuff, then the parser in
the Demux needs to take care of that too. Even more generally, even
any change at all to sar's output would require an update of the

I think the fundamental problem is that having an Exec adaptor which
pulls the unstructured output of an external program and having a
Demux processor that makes assumptions about what that output looks
like and what it means, makes the whole workflow dependent on
something not under the control of Chukwa.

I can imagine one way of working around that would be to not use sar
and write custom parsers for /proc so that Chukwa is itself aware of
what the proc data actually means without having to make assumptions
about the output of an external parser; it's reinventing the wheel
somewhat but it gives an end-to-end cleaner solution.

The other answer would perhaps be the "web services" answer of having
a whole standardized way of passing data around in a structured way
but then that starts to look like a generalized pub/sub system.

But in the meantime maybe the sar version on the system being
monitored could be picked up in some way (metadata in the Chunk?) and
the various Demux processors dependent on such external programs e.g.
IoStat, Df, etc. could be parameterized to handle output from
different versions/variants of the source program. Or to be even more
general, the Exec adaptor could send along an MD5 hash of the program
it's calling, and then you'd have a whole bunch of processors for
every possible variant of the program you want to support; that sounds
terribly hackish to me but I think that way at least the identity of
the external dependency can be identified.


On Thu, May 21, 2009 at 10:06 PM, Ariel Rabkin <> wrote:
> Howdy all.
> So I've noticed something.  The default mdl.xml has entries for
> memused, and kbkached.
> My version of sar outputs kbcached, and *kb*memused.    So memused
> doesn't display right.
> In general though, I've gotten worried about units.
> if I stick 1000 * kbmemused in mdl.xml, will that get pasted into a
> SQL command and will the right thing happen?
> Is there a better way to do unit conversion, other than hacking the Java?
> Is there any way to know what the right units are, actually?
> --Ari
> --
> Ari Rabkin
> UC Berkeley Computer Science Department

View raw message