hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3984) Rethink event column key issue
Date Tue, 28 Jul 2015 02:14:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643758#comment-14643758
] 

Vrushali C commented on YARN-3984:
----------------------------------

I can take this up. Please feel free to reassign or if someone else wants it, please let me
know on the jira and we can redistribute the jira. 

To add to my previous comment, let's take an example. Say event id is KILLED and it occurs
3 times for whatever reason. Now let's say: 
at ts1, for key "DIAGNOSTICS", the value is "xyz". 
at ts1, for key "SOMETHING ELSE", the value is "something"
at ts2, for key "DIAGNOSTICS", the value is "abc" 
at ts3, for key "DIAGNOSTICS", the value is "pqr"
at ts3, for key "SOMETHING ELSE", the value is "something even more"

where ts1 < ts2 < ts3. So ts3 is the most recent timestamp.

Now which of the queries is the most commonly required:
- for this application, what is the diagnostic message for the most recent KILLED event id?
Or all of the diagnostics in KILLED id?
- for this application, what is the most recent key(s) in the KILLED event id ?
- for this application, what are the keys (& values)  that occurred between ts2 and ts3
for KILLED event id? 

If we think #2 and #3 are the most commonly run queries, then we can go with timestamp before
the key.
If we think #1 is the most commonly run query, then we can go with key before timestamp. 

Now if we choose timestamp before key, then we can never pull back the value given an event
and a key without fetching all keys in that event for all timestamps. 

If we choose key before timestamp, we cant easily pull back most recently occurred key within
an event. 

In any case, we can't know which event was the most recent in the application. For example,
in this case, INITED event record will be stored before KILLED event record since I < K
and hbase will sort it lexicographically.

So we are interested in knowing which event itself occurred the most recent, then we need
to fetch all events (along with event keys and timestamps) and sort by timestamp and then
return the most recent event.


> Rethink event column key issue
> ------------------------------
>
>                 Key: YARN-3984
>                 URL: https://issues.apache.org/jira/browse/YARN-3984
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Vrushali C
>             Fix For: YARN-2928
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not so friendly
to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key
may be a better key schema. I open this jira to continue the discussion about it which was
commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message