Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@ambari.apache.org
Date: Mon, 3 Feb 2014 19:51:08 +0000 (UTC)
From: "Dmitry Lysnichenko (JIRA)" <jira@apache.org>
To: dev@ambari.apache.org
Message-ID: <JIRA.12692496.1391169004346.28992.1391457068861@arcas>
In-Reply-To: <JIRA.12692496.1391169004346@arcas>
References: <JIRA.12692496.1391169004346@arcas>
Subject: [jira] [Updated] (AMBARI-4481) Add to the agent ability to download
 service scripts and hooks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/AMBARI-4481?page=3Dcom.atlassi=
an.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitry Lysnichenko updated AMBARI-4481:
---------------------------------------

    Description:=20
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HT=
TP. These files are accessible via url like http://hostname:8080/resources/=
jdk-6u31-linux-x64.bin . Among these files there are service scripts, templ=
ates and hooks. Agent has a cache of these files. Cache directory structure=
 is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
=E2=94=94=E2=94=80=E2=94=80 stacks
    =E2=94=94=E2=94=80=E2=94=80 HDP
        =E2=94=9C=E2=94=80=E2=94=80 2.0.7
        =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 Accumulo
        =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 Flume
        =E2=94=94=E2=94=80=E2=94=80 2.0.8
            =E2=94=9C=E2=94=80=E2=94=80 Accumulo
            =E2=94=9C=E2=94=80=E2=94=80 Flume
            =E2=94=94=E2=94=80=E2=94=80 YetAnotherService
{code}
If files for some service, component and stack version is not available at =
cache, agent downloads appropriate files on first use. After files are succ=
essfully unpacked, hash is also downloaded to a separate file (this way, we=
 ensure cache consistency). If any step of cache update fails (due to timeo=
ut, missing files, broken archieve etc), agent fails command execution with=
 an appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not =
allow to list directories.  We have two options:
- To speed up download and avoid  the need to list script files explicitly,=
 the proposal is to pack directories "hooks" and "packages" into zip archiv=
es. After download, agent unpacks archive into cache.
- We may set "dirAllowed" servlet option for /resources/* and in this case =
agent will download all files one by one. User will not have to run additio=
nal commands to have stack files updated (improved usability). For every fi=
le being downloaded, a separate request will be sent. This way to fetch fil=
es seems to be too slow, especially on big clusters.

As a second way seems to be not applicable because it limits scalability, I=
'm going to implement the first way. Implementation steps:
- on server startup, python script iterates over "hooks"/"package" director=
ies and counts directory sha1 hashes. Files and directories are listed in a=
lphabetical order, hash sum files and existing directory archives are skipp=
ed.
- if directory archive does not exist or sha1 hash sum differs from previou=
sly counted hash sum, archive is regenerated and saved to "archive.zip" fil=
e.
- sha1 hash of the directory is saved to .hash file in the root of "hooks"/=
"package" directory.
This way, we ensure that an archive is still actual if user changes some fi=
le in directory or replaces entire directory. =20

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new files/stack=
s/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. W=
e use them for cache invalidation. As stack files may only change on server=
 restart (and agent reregistration), we will verify hashes only once and st=
ore the result in FileCache until next agent registration.

h2. Custom actions
I'm going to use the same approach for fetching /var/lib/ambari-agent/resou=
rces/custom_actions. [~sumitmohanty], can you please post any entry points =
of using/testing custom actions via API?


  was:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HT=
TP. These files are accessible via url like http://hostname:8080/resources/=
jdk-6u31-linux-x64.bin . Among these files there are service scripts, templ=
ates and hooks. Agent has a cache of these files. Cache directory structure=
 is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
=E2=94=94=E2=94=80=E2=94=80 stacks
    =E2=94=94=E2=94=80=E2=94=80 HDP
        =E2=94=9C=E2=94=80=E2=94=80 2.0.7
        =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 Accumulo
        =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 Flume
        =E2=94=94=E2=94=80=E2=94=80 2.0.8
            =E2=94=9C=E2=94=80=E2=94=80 Accumulo
            =E2=94=9C=E2=94=80=E2=94=80 Flume
            =E2=94=94=E2=94=80=E2=94=80 YetAnotherService
{code}
If files for some service, component and stack version is not available at =
cache, agent downloads appropriate files on first use. After files are succ=
essfully unpacked, hash is also downloaded to a separate file (this way, we=
 ensure cache consistency). If any step of cache update fails (due to timeo=
ut, missing files, broken archieve etc), agent fails command execution with=
 an appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not =
allow to list directories.  We have two options:
- To speed up download and avoid  the need to list script files explicitly,=
 the proposal is to pack directories "hooks" and "packages" into tar.gz arc=
hives. After download, agent unpacks archive into cache.
- We may set "dirAllowed" servlet option for /resources/* and in this case =
agent will download all files one by one. User will not have to run additio=
nal commands to have stack files updated (improved usability). For every fi=
le being downloaded, a separate request will be sent. This way to fetch fil=
es seems to be too slow, especially on big clusters.

As a second way seems to be not applicable because it limits scalability, I=
'm going to implement the first way. Implementation steps:
- on server startup, python script iterates over "hooks"/"package" director=
ies and counts directory sha1 hashes. Files and directories are listed in a=
lphabetical order, hash sum files and existing directory archives are skipp=
ed.
- if directory archive does not exist or sha1 hash sum differs from previou=
sly counted hash sum, archive is regenerated and saved to "archive.tar.gz" =
file.
- sha1 hash of the directory is saved to .hash file in the root of "hooks"/=
"package" directory.
This way, we ensure that an archive is still actual if user changes some fi=
le in directory or replaces entire directory. =20

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new files/stack=
s/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. W=
e use them for cache invalidation. As stack files may only change on server=
 restart (and agent reregistration), we will verify hashes only once and st=
ore the result in FileCache until next agent registration.

h2. Custom actions
I'm going to use the same approach for fetching /var/lib/ambari-agent/resou=
rces/custom_actions. [~sumitmohanty], can you please post any entry points =
of using/testing custom actions via API?


> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
>                 Key: AMBARI-4481
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4481
>             Project: Ambari
>          Issue Type: Task
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via =
HTTP. These files are accessible via url like http://hostname:8080/resource=
s/jdk-6u31-linux-x64.bin . Among these files there are service scripts, tem=
plates and hooks. Agent has a cache of these files. Cache directory structu=
re is similar to contents of a stacks folder at server. For example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> =E2=94=94=E2=94=80=E2=94=80 stacks
>     =E2=94=94=E2=94=80=E2=94=80 HDP
>         =E2=94=9C=E2=94=80=E2=94=80 2.0.7
>         =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 Accumulo
>         =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 Flume
>         =E2=94=94=E2=94=80=E2=94=80 2.0.8
>             =E2=94=9C=E2=94=80=E2=94=80 Accumulo
>             =E2=94=9C=E2=94=80=E2=94=80 Flume
>             =E2=94=94=E2=94=80=E2=94=80 YetAnotherService
> {code}
> If files for some service, component and stack version is not available a=
t cache, agent downloads appropriate files on first use. After files are su=
ccessfully unpacked, hash is also downloaded to a separate file (this way, =
we ensure cache consistency). If any step of cache update fails (due to tim=
eout, missing files, broken archieve etc), agent fails command execution wi=
th an appropriate message.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does no=
t allow to list directories.  We have two options:
> - To speed up download and avoid  the need to list script files explicitl=
y, the proposal is to pack directories "hooks" and "packages" into zip arch=
ives. After download, agent unpacks archive into cache.
> - We may set "dirAllowed" servlet option for /resources/* and in this cas=
e agent will download all files one by one. User will not have to run addit=
ional commands to have stack files updated (improved usability). For every =
file being downloaded, a separate request will be sent. This way to fetch f=
iles seems to be too slow, especially on big clusters.
> As a second way seems to be not applicable because it limits scalability,=
 I'm going to implement the first way. Implementation steps:
> - on server startup, python script iterates over "hooks"/"package" direct=
ories and counts directory sha1 hashes. Files and directories are listed in=
 alphabetical order, hash sum files and existing directory archives are ski=
pped.
> - if directory archive does not exist or sha1 hash sum differs from previ=
ously counted hash sum, archive is regenerated and saved to "archive.zip" f=
ile.
> - sha1 hash of the directory is saved to .hash file in the root of "hooks=
"/"package" directory.
> This way, we ensure that an archive is still actual if user changes some =
file in directory or replaces entire directory. =20
> h2. How to change stack files after server installation
> To change stack files (scripts, templates and so on) or add new files/sta=
cks/etc, user has to:
> - stop ambari-server
> - perform changes
> - start ambari-server
> - everything else will be done automagically
> h2. Cache invalidation
> Besides package archives, agent also downloads and stores archive hashes.=
 We use them for cache invalidation. As stack files may only change on serv=
er restart (and agent reregistration), we will verify hashes only once and =
store the result in FileCache until next agent registration.
> h2. Custom actions
> I'm going to use the same approach for fetching /var/lib/ambari-agent/res=
ources/custom_actions. [~sumitmohanty], can you please post any entry point=
s of using/testing custom actions via API?


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)