hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors
Date Tue, 24 May 2016 07:39:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297844#comment-15297844

Varun Saxena commented on YARN-5109:

bq. Also, do we have a test that tests an encoded long having a separator in it? After all,
that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and cluster timestamp(in
app id) in a manner that will have separators in it. Event column name issue is also simulated.
Infact it takes care of the case if QUALIFIER changes in future as well. TestHBaseTimelineStorage#testEventsEscapeTs
takes care of issue with event column name in an E2E test case.

bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent, right?
As such, its not completely equal. We are calling joinEncoded, which takes strings. If we
call join, we will have to first encode the string. I anyways added a constant EMPTY_STRING
in Separator and using it.

bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're using VARIABLE_SIZE
for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits returned. VARIABLE_SIZE
is used to indicate that size of a segment in split is variable. Anyways we can say VARIABLE_SIZE
means not a fixed number of splits as well.

Other issues have been fixed.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch,
> When we store timestamps (for example as part of the row key or part of the column name
for an event), the bytes are used as is without any encoding. If the byte value happens to
contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we
read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) was the
> {noformat}
> {noformat}
> Note that the column name is supposed to be of the format (event id)=(timestamp)=(event
info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message