hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3906) split the application table from the entity table
Date Thu, 30 Jul 2015 23:24:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648472#comment-14648472
] 

Sangjin Lee commented on YARN-3906:
-----------------------------------

bq. I've also noticed that the newly added Application*.java files overlap significantly with
Entity*.java.

Thanks for bringing up that point [~gtCarrera9]. I should have added some explanations on
why I wrote it this way. That is the first thing I noticed as I looked into adding the new
table.

\*Table and \*RowKey are not so bad, but \*ColumnFamily, \*Column, and \*ColumnPrefix have
definitely a lot of overlapping code. That is largely an artifact of the design decision to
use enums to implement these classes. Enums are nice because it lets us seal the list of members
cleanly, and the code that uses the API becomes very strongly typed. On the other hand, the
downside is that enums cannot be extended.

If enums could be extended, we could have created a base class that's common both for the
entity table and the application table, and have the entity table and the application table
extend it pretty trivially. But unfortunately it doesn't work with enums. Nor does Java have
an option of mix-ins like scala.

As a way to minimize the duplication, we introduced {{ColumnHelper}} to provide many of the
common operations into that helper class. You'll notice that most of the implementations in
the \*Column\* classes are simple pass-through to {{ColumnHelper}}.

This issue is more pronounced because the entity table and the application table are so similar.
For example, for the app-to-flow table (which Zhijie is working on), this might not be as
big an issue.

We could think of some alternatives, but I think they also have their own challenges. First,
we could think of having only one set of classes both for the entity table and the application
table, and controlling which one to use via some sort of an argument/flag. But then the problem
is that we would have lots of {{if application ... else ...}} code scattered around in that
single implementation. I'm not sure if it is an improvement.

Eventually, if this becomes more of a need, we could envision writing some sort of code generation
and the table/schema description instruction so that given the schema description these classes
can be simply code-generated. However, as you may know, code generation is not without problems...

I hope this clarifies some of the thinking that went into this.

> split the application table from the entity table
> -------------------------------------------------
>
>                 Key: YARN-3906
>                 URL: https://issues.apache.org/jira/browse/YARN-3906
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3906-YARN-2928.001.patch, YARN-3906-YARN-2928.002.patch
>
>
> Per discussions on YARN-3815, we need to split the application entities from the main
entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message