hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors
Date Thu, 26 May 2016 16:30:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302345#comment-15302345
] 

Joep Rottinghuis commented on YARN-5109:
----------------------------------------

Indeed it is easy to do now with the way KeyConverter and Separator are written, and yeah
I was ambiguous about whether we should encode.
After thinking about it a bit more I do think we should encode tabs as well. If we encode
both we should ensure that we encode and decode in the same order.
Probably as a general rule we should encode/decode things that are specified by a user, especially
those things that we can expect to see spaces (or tabs) in, but probably as a good practice
any values that comes from a user that goes into a column qualifier.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch,
YARN-5109-YARN-2928.03.patch, YARN-5109-YARN-2928.04.patch, YARN-5109-YARN-2928.05.patch,
YARN-5109-YARN-2928.06.patch
>
>
> When we store timestamps (for example as part of the row key or part of the column name
for an event), the bytes are used as is without any encoding. If the byte value happens to
contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we
read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) was the
following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event id)=(timestamp)=(event
info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message