hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
Date Mon, 09 Nov 2015 18:50:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997121#comment-14997121
] 

Varun Saxena commented on YARN-4053:
------------------------------------

Vrushali, thanks for your comments.
I would like to work on this. Let me take a stab on this one. Will have the bandwidth.
I hope its fine. You can help me with the reviews.

Coming to the points, 
I agree that flag is not good for extensibility. As I said earlier, flag should be fine for
now as we have only 2 choices(generic or long) and we can extend later. 
But eventually will have to have different handlers for different types. So why not do it
now. Hence, lets go with proposal above.

Moreover, yes, we need to have proper handling based on data type or conversion mechanism
in FlowScanner too. As mentioned in an earlier comment, I was thinking we can indicate this
in attributes. But I guess your proposal sounds better. We can identify the column/column
prefix in flow scanner as well and convert based on the converter attached to it.

bq. it missed one of the places in the current patch for example
Which place ? MIN/MAX handling ?

bq. For single value vs time series, we suggest using a column prefix to distinguish them
Do we need to have a differentiation between SINGLE_VALUE and TIME_SERIES if by default it
will be read as SINGLE_VALUE ? Because we will be storing multiple values even for metric
of type SINGLE_VALUE. Do you mean on the read side, only the latest value of a metric is to
be returned if its of type SINGLE_VALUE (even if client asks for TIME_SERIES) ? Again the
assumption here is that client will always send the metric type(SINGLE_VALUE or TIME_SERIES)
consistently.

bq. For the read path, we can assume it is a single value unless specifically specified by
the client as a time series (as clients would need to intend to read time series explicitly).
We can return TIME_SERIES by indicating something like METRICS_TIME_SERIES as fields. If we
do so, it will have implications on YARN-3862.
Now the question is whether to return values for multiple timestamps even for metric type
of SINGLE_VALUE if client asks for it ? What if client wants to see values of a gauge(which
might be considered as a SINGLE_VALUE) over a period of time, for instance. If yes, do we
need to even differentiate between the 2 types ?

bq. We finally concluded that we should start with storing longs only and make the code strictly
accept longs 
JAX-RS i.e. the REST API layer will convert an integral value to Integer automatically if
its less than Integer.MAX_VALUE so I guess we will have to handle ints and shorts as well
i.e. if its an Integer for instance, we can call Integer#longValue to convert it to long.

bq. Regarding indicating whether to aggregate or not, we suggest to rely mostly on the flow
run aggregation. For those use cases that need to access metrics off of tables other than
the flow run table (e.g. time-based aggregation), we need to explore ways to specify this
information as input (config, etc.)
I hope Li Lu is fine with this because I remember him saying on YARN-3816 that he will be
using it for offline aggregation in YARN-3817. I think rows from application table are being
used in the MR job there. Are you suggesting that for offline aggregation, based on config,
we aggregate all the application metrics(to flow or user) or nothing ?
Or configure a set of metrics to aggregate in some config ?

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch, YARN-4053-YARN-2928.02.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store values in
backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded
byte array).
> While this is fine in most cases, it does not quite serve our use case for metrics. 
> So we need to decide how are we going to encode and decode metric values and store them
in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message