hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors
Date Thu, 19 May 2016 16:54:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291498#comment-15291498
] 

Sangjin Lee commented on YARN-5109:
-----------------------------------

Good point about the column qualifiers not needing the correct order. One other complication
with the column qualifiers is that the call chain is several levels. The changes to use the
new {{split()}} method there would be bit bigger.

If we were to go the route of encoding the bytes as well, there is one other issue we need
to be mindful of. We need to guard against the occurrences of the encoded equivalent in the
original bytes. For example, "=" would be encoded into "%1$". A problem would arise if the
original bytes already contained "%1$" however unlikely that may be. Consider the following
original bytes (totally made up with ascii characters):
{noformat}
t=h%1$ig
{noformat}

If we simply encode "=", then we get
{noformat}
t%1$h%1$ig
{noformat}

Now, if we read this back and decode it, we would decode it to
{noformat}
t=h=ig
{noformat}

To do this properly, we'd need to "escape" the existing patterns *before* encoding for the
separator. The reverse should be done when decoding it.

To be clear, this is an existing issue (even with strings). We went ahead without treating
for this as we felt that this is unlikely to occur in a string. But if we're going to revisit
encoding, we might want to address that as well.

We can discuss the details offline if needed.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>
> When we store timestamps (for example as part of the row key or part of the column name
for an event), the bytes are used as is without any encoding. If the byte value happens to
contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we
read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) was the
following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event id)=(timestamp)=(event
info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message