ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-22721) Centralize the Management of Tarball Uploading
Date Wed, 03 Jan 2018 18:52:00 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-22721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-22721:
-------------------------------------
    Issue Type: Task  (was: Bug)

> Centralize the Management of Tarball Uploading
> ----------------------------------------------
>
>                 Key: AMBARI-22721
>                 URL: https://issues.apache.org/jira/browse/AMBARI-22721
>             Project: Ambari
>          Issue Type: Task
>    Affects Versions: 2.6.2
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.6.2
>
>
> Ambari is required to upload tarballs into HDFS for many of the services to correctly
function after they are installed. This tarball management is not centralized in any way,
and is instead, spread out between several different Python files for various services:
> Hive uploads Tez, MapReduce2, Sqoop, etc tarballs
> Yarn does Tez, Slider, MapReduce2
> This causes a problem when patching a specific service, such as Sqoop. Sqoop requires
that sqoop.tar.gz and mapreduce.tar.gz are available in the same versioned folder in HDFS.
However, no Sqoop components perform this upload - Hive does. So, if Hive is not upgrading,
these tarballs are never uploaded.
> The proposal here is to remove the coupling of tarball uploads and to manage these relationships
on the stack:
> {code}
> {
>   "tarball": {
>     "MAPREDUCE2": {
>       "JOB_HISTORY_SERVER": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         }
>       ]
>     },
>     "HIVE": {
>       "HIVE_SERVER2": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         },
>         {
>           "tarball": "sqoop.tar.gz",
>           "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
>           "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
>         }
>       ]
>     },
>     "SQOOP": {
>       "SQOOP": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         },
>         {
>           "tarball": "sqoop.tar.gz",
>           "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
>           "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
>         }
>       ]
>     }
>   }
> }
> {code}
> - after-INSTALL hooks will check for {{CLIENT}} as the component category
> - after-START hooks will check for NOT {{CLIENT}}
> Additionally, using the file length for a checksum may no longer be sufficient. We should
also add a checksum file to HDFS for each tarball so we can easily tell if work needs to be
done (during an install, restart, upgrade, etc) to upload a new tarball (one that is also
potentially modified with native libraries):
> {code:title=ambari-tarball-checksum.json (0644)}
> {
>   "mapreduce.tar.gz": {
>     "native_libraries": true,
>     "file_count": 509
>   },
>   "hadoop-streaming.tar.gz": {
>     "native_libraries": false,
>     "file_count": 10  
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message