ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-4481) Add to the agent ability to download service scripts and hooks
Date Fri, 14 Feb 2014 15:34:19 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmitry Lysnichenko updated AMBARI-4481:
---------------------------------------

    Description: 
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These files
are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among
these files there are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at cache, agent downloads
appropriate files on first use. After files are successfully unpacked, hash is also downloaded
to a separate file (this way, we ensure cache consistency). If any step of cache update fails
(due to timeout, missing files, broken archieve etc), agent fails command execution with an
appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not allow to list directories.
 To speed up download and avoid  the need to list script files explicitly, the proposal is
to pack directories "hooks" and "packages" into zip archives. After download, agent unpacks
archive into cache.
Execution steps:
- on server startup, python script iterates over "hooks"/"package" directories and counts
directory sha1 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped. Only active (enabled) stacks are hashed/archived.
- if directory archive does not exist or sha1 hash sum differs from previously counted hash
sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of "hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file in directory
or replaces entire directory.  

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new files/stacks/etc, user has
to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We use them for
cache invalidation. As stack files may only change on server restart (and agent reregistration),
we will verify hashes only once and store the result in FileCache until next agent registration.

h2. Custom actions
Custom action scripts are fetched/updated the same way as other files and are stored at  /var/lib/ambari-agent/cache/custom_actions.

h2. Choosing error handling strategy for download/unpack errors and other settings
Agent has two caching-related settings at ambari-agent.ini file.
{code}
[agent]
cache_dir=/var/lib/ambari-agent/cache
tolerate_download_failures=true
{code}
tolerate_download_failures option (defaults to true) determines agent behaviour in case of
any cache update error (while checking hashes, during file download or archive unpacking).
If a value is true, agent just writes down a warning and continues command execution with
existing cache. If value is false, agent immediately considers ExecutionCommand failed (so
user may see the failed command at UI with appropriate error message).

h2. rpm packaging
Currently, stack files are included both to ambari-agent and to ambari-server rpms. So agent
comes with pre-packaged file cache. The issue is that files that are packaged into an agent
cache are not hashed (no ".hash" files exist), that's why after rpm installation agent considers
it's cache stale and tries to update cache from the server. I'll add on-fly stack files hashing
during rpm generation at a separate jira. 

h2. other ambari-server changes
I've created a valid python ambari-server  package, that is properly packaged into rpm and
is visible to ambari-server.py.

  was:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These files
are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among
these files there are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at cache, agent downloads
appropriate files on first use. After files are successfully unpacked, hash is also downloaded
to a separate file (this way, we ensure cache consistency). If any step of cache update fails
(due to timeout, missing files, broken archieve etc), agent fails command execution with an
appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not allow to list directories.
 We have two options:
- To speed up download and avoid  the need to list script files explicitly, the proposal is
to pack directories "hooks" and "packages" into zip archives. After download, agent unpacks
archive into cache.
- We may set "dirAllowed" servlet option for /resources/* and in this case agent will download
all files one by one. User will not have to run additional commands to have stack files updated
(improved usability). For every file being downloaded, a separate request will be sent. This
way to fetch files seems to be too slow, especially on big clusters.

As a second way seems to be not applicable because it limits scalability, I'm going to implement
the first way. Implementation steps:
- on server startup, python script iterates over "hooks"/"package" directories and counts
directory sha1 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped.
- if directory archive does not exist or sha1 hash sum differs from previously counted hash
sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of "hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file in directory
or replaces entire directory.  

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new files/stacks/etc, user has
to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We use them for
cache invalidation. As stack files may only change on server restart (and agent reregistration),
we will verify hashes only once and store the result in FileCache until next agent registration.

h2. Custom actions
Custom action scripts are fetched/updated the same way as other files and are stored at  /var/lib/ambari-agent/cache/custom_actions.

h2. Choosing error handling strategy for download/unpack errors and other settings
Agent has two related settings at ambari-agent.ini file.
{code}
[agent]
cache_dir=/var/lib/ambari-agent/cache
tolerate_download_failures=true
{code}
tolerate_download_failures option (defaults to true) determines agent actions in case of some
error occursion  while checking hashes, during file download or archive unpacking. If value
is true, agent just 


> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
>                 Key: AMBARI-4481
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4481
>             Project: Ambari
>          Issue Type: Task
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>         Attachments: AMBARI-4481_preview.patch
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These
files are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin .
Among these files there are service scripts, templates and hooks. Agent has a cache of these
files. Cache directory structure is similar to contents of a stacks folder at server. For
example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> └── stacks
>     └── HDP
>         ├── 2.0.7
>         │   ├── Accumulo
>         │   └── Flume
>         └── 2.0.8
>             ├── Accumulo
>             ├── Flume
>             └── YetAnotherService
> {code}
> If files for some service, component and stack version is not available at cache, agent
downloads appropriate files on first use. After files are successfully unpacked, hash is also
downloaded to a separate file (this way, we ensure cache consistency). If any step of cache
update fails (due to timeout, missing files, broken archieve etc), agent fails command execution
with an appropriate message.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does not allow to list
directories.  To speed up download and avoid  the need to list script files explicitly, the
proposal is to pack directories "hooks" and "packages" into zip archives. After download,
agent unpacks archive into cache.
> Execution steps:
> - on server startup, python script iterates over "hooks"/"package" directories and counts
directory sha1 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped. Only active (enabled) stacks are hashed/archived.
> - if directory archive does not exist or sha1 hash sum differs from previously counted
hash sum, archive is regenerated and saved to "archive.zip" file.
> - sha1 hash of the directory is saved to .hash file in the root of "hooks"/"package"
directory.
> This way, we ensure that an archive is still actual if user changes some file in directory
or replaces entire directory.  
> h2. How to change stack files after server installation
> To change stack files (scripts, templates and so on) or add new files/stacks/etc, user
has to:
> - stop ambari-server
> - perform changes
> - start ambari-server
> - everything else will be done automagically
> h2. Cache invalidation
> Besides package archives, agent also downloads and stores archive hashes. We use them
for cache invalidation. As stack files may only change on server restart (and agent reregistration),
we will verify hashes only once and store the result in FileCache until next agent registration.
> h2. Custom actions
> Custom action scripts are fetched/updated the same way as other files and are stored
at  /var/lib/ambari-agent/cache/custom_actions.
> h2. Choosing error handling strategy for download/unpack errors and other settings
> Agent has two caching-related settings at ambari-agent.ini file.
> {code}
> [agent]
> cache_dir=/var/lib/ambari-agent/cache
> tolerate_download_failures=true
> {code}
> tolerate_download_failures option (defaults to true) determines agent behaviour in case
of any cache update error (while checking hashes, during file download or archive unpacking).
If a value is true, agent just writes down a warning and continues command execution with
existing cache. If value is false, agent immediately considers ExecutionCommand failed (so
user may see the failed command at UI with appropriate error message).
> h2. rpm packaging
> Currently, stack files are included both to ambari-agent and to ambari-server rpms. So
agent comes with pre-packaged file cache. The issue is that files that are packaged into an
agent cache are not hashed (no ".hash" files exist), that's why after rpm installation agent
considers it's cache stale and tries to update cache from the server. I'll add on-fly stack
files hashing during rpm generation at a separate jira. 
> h2. other ambari-server changes
> I've created a valid python ambari-server  package, that is properly packaged into rpm
and is visible to ambari-server.py.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message