hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5667) Move HBase backend code in ATS v2 into its separate module
Date Tue, 27 Sep 2016 01:45:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524746#comment-15524746

Sangjin Lee commented on YARN-5667:

Those are great questions.

The diamond dependency (where there are more than one version of a given library in the dependency
graph) happens because the hadoop code uses hadoop-common 3.0.0-alpha1 directly for example,
and also 2.5.1 via indirect dependency via hbase 1.1.3. Due to hadoop's version management,
3.0.0-alpha1 is picked. The implication of this is that we build and test hbase code in the
context of timeline service *diffrent than* the declared hbase's hadoop dependencies.

Now if we think about hbase client code and hbase coprocessor code separately, we see that
the runtime for both pieces of code is different. The code that uses hbase client runs on
YARN (and therefore hadoop 3.0.0). In that environment, we need to ensure the hbase client
itself (not our code that uses hbase client) works correctly against the trunk version of

On the other hand, the hbase coprocessor code runs on hbase. Therefore, it is now timeline
service coprocessor code that needs to run under hadoop 2.5.1 (until/unless we upgrade hbase).
These both aspects need to be verified if we decide to split the code into separate modules.
That would be made easier by having them in separate modules.

If we have an hbase version that depends on the trunk, these problems would go away. And I
understand that the hbase folks are making effort to ensure the latest hbase version works
against the hadoop trunk version. That said, hbase officially can depend only on released
versions, and there will always be lags.

As for the reason that the coprocessor depends on the hbase-client-related code, there is
no strong reason that should be the case. It's just the way the code evolved. Actually it
would be good to refactor the code so that the coprocessor code has minimal dependencies.
It's worth looking into.

> Move HBase backend code in ATS v2  into its separate module
> -----------------------------------------------------------
>                 Key: YARN-5667
>                 URL: https://issues.apache.org/jira/browse/YARN-5667
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
> The HBase backend code currently lives along with the core ATS v2 code in hadoop-yarn-server-timelineservice
module. Because Resource Manager depends on hadoop-yarn-server-timelineservice, an unnecessary
dependency of the RM module on HBase modules is introduced (HBase backend is pluggable, so
we do not need to directly pull in HBase jars). 
> In our internal effort to try ATS v2 with HBase 2.0 which depends on Hadoop 3, we encountered
a circular dependency during our builds between HBase2.0 and Hadoop3 artifacts.
> {code}
> hadoop-mapreduce-client-common, hadoop-yarn-client, hadoop-yarn-server-resourcemanager,
hadoop-yarn-server-timelineservice, hbase-server, hbase-prefix-tree, hbase-hadoop2-compat,
hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-common]
> {code}
> This jira proposes we move all HBase-backend-related code from hadoop-yarn-server-timelineservice
into its own module (possible name is yarn-server-timelineservice-storage) so that core RM
modules do not depend on HBase modules any more.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message