Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 187EB18D44 for ; Thu, 30 Jul 2015 22:50:05 +0000 (UTC) Received: (qmail 37393 invoked by uid 500); 30 Jul 2015 22:50:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 37342 invoked by uid 500); 30 Jul 2015 22:50:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 37331 invoked by uid 99); 30 Jul 2015 22:50:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 22:50:04 +0000 Date: Thu, 30 Jul 2015 22:50:04 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3984) Rethink event column key issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648428#comment-14648428 ] Vrushali C commented on YARN-3984: ---------------------------------- Hi Zhijie, bq. the current query we want to support now (in YARN-3051 and YARN-3049) is to retrieve all events belonging to an entity (e.g. application, attempt, container and etc.). Yes, "fetch all events" query is supported with all types of row key designs. Fetching all events is not affected by the row key order. The reader would construct a set /list of TimelineEvents in any case and then sort them in the code. The timestamp will help in ordering but you don't know when to stop the scan, so all events belonging to all timestamps have to be fetched and sorting and filtering out latest events has to be done in the code in any case when we fetch all events. bq. In this case, the most efficient way is to put timestamp even before the event ID, so that we don't need to order the events in memory This would mean that we would *never* be able to query for a specific event. We would *always* have to fetch all events belonging to all timestamps and perform client side filtering. I see the point about the info map being empty/null. I will add a case to store event id and timestamp when the info map is null. > Rethink event column key issue > ------------------------------ > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Zhijie Shen > Assignee: Vrushali C > Fix For: YARN-2928 > > > Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)