Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BA60E200AC0 for ; Tue, 24 May 2016 09:39:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B90CC160A11; Tue, 24 May 2016 07:39:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AA7CB160A27 for ; Tue, 24 May 2016 09:39:14 +0200 (CEST) Received: (qmail 6139 invoked by uid 500); 24 May 2016 07:39:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 5826 invoked by uid 99); 24 May 2016 07:39:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2016 07:39:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 057502C1F6B for ; Tue, 24 May 2016 07:39:13 +0000 (UTC) Date: Tue, 24 May 2016 07:39:13 +0000 (UTC) From: "Varun Saxena (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 24 May 2016 07:39:15 -0000 [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297844#comment-15297844 ] Varun Saxena commented on YARN-5109: ------------------------------------ bq. Also, do we have a test that tests an encoded long having a separator in it? After all, that's what caused us to uncover this issue. Yes, we have. In TestKeyConverters, I am trying to create flow run id and cluster timestamp(in app id) in a manner that will have separators in it. Event column name issue is also simulated. Infact it takes care of the case if QUALIFIER changes in future as well. TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event column name in an E2E test case. bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent, right? As such, its not completely equal. We are calling joinEncoded, which takes strings. If we call join, we will have to first encode the string. I anyways added a constant EMPTY_STRING in Separator and using it. bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits returned. VARIABLE_SIZE is used to indicate that size of a segment in split is variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits as well. Other issues have been fixed. > timestamps are stored unencoded causing parse errors > ---------------------------------------------------- > > Key: YARN-5109 > URL: https://issues.apache.org/jira/browse/YARN-5109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Varun Saxena > Priority: Blocker > Labels: yarn-2928-1st-milestone > Attachments: YARN-5109-YARN-2928.003.patch, YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, YARN-5109-YARN-2928.03.patch > > > When we store timestamps (for example as part of the row key or part of the column name for an event), the bytes are used as is without any encoding. If the byte value happens to contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we read it. > I came across this while looking into this error in the timeline reader: > {noformat} > 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils: incorrectly formatted column name: it will be discarded > {noformat} > I traced the data that was causing this, and the column name (for the event) was the following: > {noformat} > i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST > {noformat} > Note that the column name is supposed to be of the format (event id)=(timestamp)=(event info key). However, observe the timestamp portion: > {noformat} > \x7F\xFF\xFE\xABDY=\x99 > {noformat} > The presence of the separator ("=") causes the parse error. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org