hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5105) entire time series is returned for YARN container system metrics (CPU and memory)
Date Thu, 26 May 2016 00:22:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301193#comment-15301193
] 

Joep Rottinghuis commented on YARN-5105:
----------------------------------------

While I agree that we can postpone the decision whether to add more complexity later as to
adding a time range and/or a count range, I feel we need to leave the door open to do so.
So that leads me to a slightly different opinion on adding a single boolean attribute isTimeSeries
yes/no type of argument.

If we think forward, how would that work with isTimeSeries? Would we then have both a time
range and we'd mandate multiple values with "isTimeSeries"?
In addition, just the boolean doesn't immediately invoke the sense that if you say false that
you get 1 value (the latest one) back, versus getting skipping metrics altogether. I think
we can already do that by specifying fields to retrieve.

Read for example the javadoc on TimelineDataToRetrieve:
{code}
* <li><b>isTimeSeries</b> - If fieldsToRetrieve contains METRICS/ALL or
57	 * metricsToRetrieve is specified, this boolean flag indicates whether a time
58	 * series needs to be returned for these metrics. The flag is ignored if METRICS
59	 * are not to be fetched.</li>
{code}
It isn't quite clear that 1 row is returned if isTimeSeries is false.
Admittedly, TimelineReaderWebServices is a bit more explicit:
{code}
257	   * @param timeSeries If specified, defines whether a metric time series needs
258	   *     to be returned if fields contains METRICS/ALL or metricsToRetrieve is
259	   *     specified. Ignored otherwise. If value is true, means time series will
260	   *     be returned. All other values will be treated as false, including when
261	   *     this parameter is unspecified. In such cases, latest single value of
262	   *     metric(s) will be returned (Optional query param).
{code}
It still a little confusing.

Given that we already have the concept of limit to limit the # entities we return, why don't
change the timeseries argument from boolean to a timeserieslimit. We'd document that the default
is 1 and that -1 means no limit (ie retrieve the entire time series). Furthermore we can specify
for now that the only two values allowed are -1 and 1. In other words, -1 is no limit, or
else only one record is returned. The query limiting maps relatively neatly to the HBase get.
ApplicationEntityReader. getResults
in your latest patch was:
{code}
315	    if (getDataToRetrieve().isTimeSeries()) {
316	      get.setMaxVersions(Integer.MAX_VALUE);
317	    }
{code}
and would become:
{code}
315	    if (getDataToRetrieve().getTimeSeriesLimit() >= 0) {
316	      get.setMaxVersions(getDataToRetrieve().getTimeSeriesLimit());
317	    }
{code}

I agree that we shouldn't try to distinguish between separate limits for separate columns
for now to keep things simple. 

Now if we were to add the time range to further give flexibility to limit which records are
retrieved, that would be relatively orthogonal to timeSeriesLimit. We'd simply return the
last # metrics (per column) that fall within the specified range.

> entire time series is returned for YARN container system metrics (CPU and memory)
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-5105
>                 URL: https://issues.apache.org/jira/browse/YARN-5105
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5105-YARN-2928.01.patch, YARN-5105-YARN-2928.02.patch, YARN-5105-YARN-2928.03.patch
>
>
> I see that the entire time series of the CPU and memory metrics are returned for the
YARN containers REST query. This has a potential of bloating the output big time.
> {noformat}
> "metrics": [
> {
>     "type": "TIME_SERIES",
>     "id": "MEMORY",
>     "values": 
> {
>     "1463518173363": ​407539712,
>     "1463518170347": ​407539712,
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message