deltacloud-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Lutterkort <>
Subject RE: normalised metric names?
Date Wed, 22 Aug 2012 00:23:27 GMT
On Sat, 2012-08-18 at 18:03 +1000, Koper, Dies wrote:
> Here is my comparison:
> EC2			FGCP				Mock
> CPUUtilization		cpuUtilization			cpuUtilization
> NetworkIn		nicInputByte			nicInputByte
> NetworkOut		nicOutputByte			nicOutputByte
> -			nicInputPacket			nicInputPacket
> -			nicOutputPacket		nicOutputPacket
> DiskReadOps		diskReadRequestCount
> diskReadRequestCount
> DiskWriteOps		diskWriteRequestCount	diskWriteRequestCount
> DiskReadBytes		-				-
> DiskWriteBytes		-				-
> -			diskReadSector			diskReadSector
> -			diskWriteSector		diskWriteSector
> (I've taken the EC2 metric names from output Michal sent to me at the
> time).
> Five look like a match.
> I'm thinking two options:
> 1. Come up with 5 new metric names that capture those and won't clash
> with any other providers' current or future metric names, and add them
> to the collection returned by DC. Future proof, but would mean a lot of
> double data being returned.
> 2. Take those 5 names from EC2 (as they were already supported in DC
> v1.0) and replace the equivalents in FGCP and mock. The disadvantage of
> that is that if EC2 adds e.g. a DiskReadSector metric, FGCP and mock
> would need to have two to keep backwards compatibility. Also, EC2 and
> FGCP metrics (and names) are obtained dynamically using an API so we may
> not know when new metrics are added. (Well, I would know for FGCP but
> not for EC2.)
> If you don't mind breaking compatibility for EC2, there may be an option
> 1b: Come up with 5 new metric names that capture those and won't clash
> with any other providers' current or future metric names, and use those
> instead of the provider specific metric names. But that's postponing the
> decision: what would we do if ec2 adds a e.g. DiskReadSector metric in a
> next version and we introduce a new common metric for it, what would we
> do with the current diskReadSector metric in FGCP? Return both?
> Clearly this is something that should have been considered and decided
> when the ec2 metrics were introduced.

I agree; the current XML has a few problems. I'd prefer we fix them,
even if it means breaking backwards compatibility - I doubt that there
are any users of the metrics collection right now.

The current XML looks like

<metric href="http://localhost:3001/api/metrics/ami-5e837b37" id="ami-5e837b37">
           <property name="average" value="42.0"/>
           <property name="maximum" value="42.0"/>

The reason I think nobody is using them is that it's not possible to
correlate them back to the underlying entity - there's no link to the
instance whose network traffic is being monitored.

The following would make more sense:

<metric href="http://localhost:3001/api/metrics/ami-5e837b37" id="ami-5e837b37">
    <image href="/api/images/ami-5e837b37"/>
  <samples probe="network_in" provider_probe="NetworkIn">
        <property name="average" value="42.0"/>
        <property name="maximum" value="42.0"/>

To get back to your initial question: I think we should include both a
normalized name and the provider's name when we list statistics. The
normalized name should not contain a unit ("nicInputByte"), and I'd
prefer if it was underscored, but whether we call it network_in or
nic_input doesn't make a difference to me.

Each driver will need a table to map known probe names to normalized
names; if the backend provider surprises us with a new probe, we'll only
have <samples provider_probe="StarsInTheSky"/>

We also need to clamp down on what goes into <sample/> elements - the
EC2 driver just dumps out whatever EC2 sends back. To me something like

        <sample timestamp="..." count="..." unit="...">
          <entry name="average" value="42.0"/>
          <entry name="minimum" value="42.0"/>
          <entry name="maximum" value="42.0"/>
seems clearer; this would indicate that we took a sample at @timestamp,
consisting of @count measurements, and the values are reported in @unit.
The body of <sample/> only contains the statistics for that one sample.

What isn't clear to me is how we can talk about taking a sample at a
certain time when that sample consists of multiple measurements -
shouldn't that sample then have an underlying time period (from time X
to time Y, we looked N times, and here are aggregate stats of what we
saw ?)

Probably not what you were looking for ... but I think we should make a
breaking change to the XML for this.


View raw message