hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors
Date Sat, 21 May 2016 00:51:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294601#comment-15294601
] 

Varun Saxena commented on YARN-5109:
------------------------------------

[~sjlee0],
Yes, type safety will have to be ensured within this encode.
The specific function I am talking about is {{TimelineFilterUtils#createFiltersFromColumnQualifiers}}.
This is used for events and relations. Event filters and relation filters cannot be applied
using HBase SingleColumnValueFilter so we fetch all the columns specified which are there
in event filters and relation filters(i.e. events in event filters or entity types in relation
filters).

I mentioned about {{Object... params}} as that is what came to my mind just before signing
off for the day.

But on second thoughts, I think we can have a switch case based on column prefix and construct
EventColumnName from there. We will have only 2 switch cases here other than default(i.e.
ApplicationColumnPrefix.EVENT and EntityColumnPrefix.EVENT). The number of cases should not
become humongous in this switch case even from a long term perspective. And if it does, we
can revisit on a solution then.
I will go with this approach now.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch
>
>
> When we store timestamps (for example as part of the row key or part of the column name
for an event), the bytes are used as is without any encoding. If the byte value happens to
contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we
read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) was the
following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event id)=(timestamp)=(event
info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message