hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
Date Tue, 25 Aug 2015 05:51:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710633#comment-14710633
] 

Varun Saxena commented on YARN-4053:
------------------------------------

Wanted to discuss so that we can reach a consensus on how to handle YARN-4053.

*Solution 1*: We can add a 1 byte flag as part of the metric value indicating whether we are
storing integral value(0) or floating point value(1).
*Solution 2* : Another solution suggested is that type can be part of column qualifier say
something like metric=l where "l" indicates long.

Another solution is to store everything as double. But would it be fair to impose this restriction
on client while it reads data from ATS ? What if client is expecting a long and unable to
handle a double.


The major issues surrounding different approaches are that what if client does not report
metric values consistently(same metric data type). 

Now let us look at the scenarios where metric values come into picture.
*1.* While writing entity to HBase : Here, we need to consider that for the same entity, a
particular metric can be reported in multiple write calls. 
So it is possible that in one write, all values for a particular metric are reported as long
and in another write, all as floats. This can create inconsistency in both the solutions above
(have different flags and encodings for same metric in Solution 1 and different column qualifiers
for same metric in Solution 2).
We can add a valuetype field in TimelineMetric which indicates whether a set of values are
long or float. And throw an exception in TimelineMetric at the time of adding value if types
are not consistent. This will atleast ensure same data type for a particular write call.
But even here client should make sure that across writes they make sure data types are consistent.
I think getting a row to find out column qualifier name or flags attached with the values
wont be a viable option. 
So some sort of restriction on the part of the client(so that they send consistent data types
for same metric) will have to be placed whether we adopt solution 1 or solution 2.
Is there some HBase API I am not aware of ?

*2.* While reading entity from HBase in the absence of any HBase filter : In this case there
should be no issues in either solution 1 or solution 2. Because we read everything as bytes
from HBase. We can do the appropriate conversion based on the flag or column qualifier name
then.

*3.* While reading entity from HBase in the presence of HBase filters : We can have 2 kinds
of HBase filters. One filter is to retrieve specific columns(to determine which metrics to
return) and other one is to trim down the rows/entities to be returned based on metric value
comparison.
The first class of filters which determine which columns to return, those should work in both
the cases(Solution 1 and 2). 
Even in solution 2, because we use prefix filters as of now. If we use regex matching though,
it might make things more complicated in case of Solution 2.

For the second set of filters, we would require to know data type of the metric value in both
the proposed solutions. Because SingleColumnValueFilter requires exact column qualifier name(for
Solution 2). And for solution 1 also we should know the data type of metric so that we can
append the value to be compared against with the flag(so that BinaryComparator can be used).
If we add filters to our data object model, we can probably include data type in filters as
well. But that again is dependent on client, whether it sends correct data type or not.


As we saw in point 1, we need to impose restriction on the client that it sends same data
type for every metric. Frankly it should be easy for client as well. If for a metric, client
expects float values, it will most likely use Double or Float.

Thoughts ? Or some other suggestions which can preclude the need for such a restriction. 

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store values in
backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded
byte array).
> While this is fine in most cases, it does not quite serve our use case for metrics. 
> So we need to decide how are we going to encode and decode metric values and store them
in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message