From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
Date Tue, 25 Aug 2015 05:54:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710638#comment-14710638

Varun Saxena commented on YARN-4053:

There was a suggestion that we can support only longs. Would supporting only longs not cause
any impact to potential users of ATS ?
longs however should cover most of the metrics(as of now I can’t think of any where decimals
would be of great importance).
If we do this, I think TimelineMetric object should be changed to accept only java.lang.Long
and not java.lang.Number…
Looping [~vinodkv] to get his opinion on this as well.
Although, is it unfair to ask client to send values consistently ?
Can’t we document this and enforce this restriction. And if client does not comply, it cannot
expect consistent results. This can be the contract between ATS and its clients.
Major concern here though would be that it won’t be possible to enforce this restriction
programmatically, neither at the client side nor at the server side.
*Possible Solution :*
There is one possible solution though if enforcing this restriction is not viable. The real
problem in both the solutions would come in applying metric filters, if data is inconsistent.
So for this, we can use approach 2(include type in column qualifier) and then insert OR filters
covering both the column qualifiers for same metric.
I will elaborate this with an example.
Let us say we have a metric called JOB_ELAPSED_TIME and client can report both integral and
floating point values for it(say). With approach 2, we will have 2 column qualifiers for this
metric i.e.  “ JOB_ELAPSED_TIME=L” (for longs) and “JOB_ELAPSED_TIME=D” (for doubles).
Now, when a query comes with metric filter value in integer format i.e. something like JOB_ELAPSED_TIME
> 40 can be transformed to corresponding HBase filter of the form (“JOB_ELAPSED_TIME=L”
> 40 OR “JOB_ELAPSED_TIME=D” > 40.0).
 i.e. a filter list of the form (“m1” > 10 AND “m2” < 5 AND “m3”=4) would
be transformed to ((“m1=L” > 10 OR “m1=D”  > 10.0) AND (“m2=L” < 5 OR
“m2=D” < 5.0) AND (“m3=L” = 4 OR “m3=D” = 4.0)).
If filter value is in decimal format then we will have to make additional changes. If filter
is something like JOB_ELAPSED_TIME > 40.75 it will have to be converted to (“JOB_ELAPSED_TIME=L”
>= 41 OR “JOB_ELAPSED_TIME=D” > 40.75). As you can see here, while matching a double
value against column qualifier storing longs, I would need increase the value to closest integer
and change filter to >=. Likewise changes will be required for < (less than) and equal
to(=) comparison as well.
However, I am not sure whether adding too many filters will cause any performance issue for
HBase or not. Because with this solution, we will in essence be doubling the size of metric
One thing we need to note though is that if we do adopt approach 2(including type in column
qualifier), regex comparison might become an issue. Because theoretically regular expressions
can become quite complex, so programmatically interpreting a regex and transforming it in
a manner where it takes both long related column qualifier and double related column qualifier
might induce bugs.
Maybe we can just support wildcard match(\*) or just do with prefix and substring filters.
Thoughts ?

However, we may want to match against only the latest version of the value for a metric.
In that case, the solution suggested above won’t work.

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch
> Currently HBase implementation uses GenericObjectMapper to convert and store values in
backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded
byte array).
> While this is fine in most cases, it does not quite serve our use case for metrics. 
> So we need to decide how are we going to encode and decode metric values and store them
in HBase.

