hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
Date Tue, 21 Jul 2015 23:15:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635996#comment-14635996
] 

Sangjin Lee commented on YARN-3908:
-----------------------------------

[~jrottinghuis], [~vrushalic], and I had offline chats, and we feel that we may need to revisit
how we store events.

Currently (with this patch) we store the event with the column name "e!eventId?infoKey" and
the column value being the info value. The event timestamp is stored as the cell timestamp.
We're realizing that this may not be a correct way to store events.

I'm basing this on the [discussion|https://issues.apache.org/jira/browse/YARN-3836?focusedCommentId=14619729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619729]
we had when we talked about the equality and identity semantics of {{TimelineEvent}}. Namely,
the id *and* the timestamp form the identity of a {{TimelineEvent}}. Then I think storing
the timestamp in the HBase cell timestamp does not work.

Some questions for you, [~zjshen] and [~gtCarrera9].

(1) *What defines the identity of a {{TimelineEvent}}?*
Is it the event id + timestamp? How about the event type? If you look at the {{equals()}}
and the {{hashCode()}} implementations of {{TimelineEvent}}, it uses the timestamp, the event
type, and even the info as a whole, but the id is not used for equality. How does that square
with the stated intent that the event id and the timestamp form the identity?

(2) *What would be the access pattern* for {{TimelineEvents}}?*
Is pretty much the only access pattern "give me all the events that belong to this entity"?

Also specifically, would you ever query for an event with the id *and* the timestamp? It is
not reasonable for readers to be able to provide the event timestamp for queries, right?

Would you also query for just the event id? What other access patterns need to be supported?

Clarifying those things would help us correctly implement the schema. Thanks!

> Bugs in HBaseTimelineWriterImpl
> -------------------------------
>
>                 Key: YARN-3908
>                 URL: https://issues.apache.org/jira/browse/YARN-3908
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Vrushali C
>         Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch,
YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a
timeline entity plus events. However, entity#info map is not stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message