ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-4481) Add to the agent ability to download service scripts and hooks
Date Fri, 31 Jan 2014 12:44:09 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmitry Lysnichenko updated AMBARI-4481:
---------------------------------------

    Description: 
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These files
are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among
these files there are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at cache, agent downloads
appropriate files on first use.
h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not allow to list directories.
 We have two options:
- To speed up download and avoid need to list script files explicitly, the proposal is to
pack directories "hooks" and "packages" into gz archives. 
- We may set "dirAllowed" servlet option for /resources/* and in this case agent will download
all files one by one. User will not have to run additional commands to have stack files updated
(improved usability). For every file being downloaded, a separate request will be sent. This
way to fetch files seems to be too slow, especially on big clusters.

As second way seems to be not applicable, I'm going to implement the first way. Implementation
steps:
- on server startup, python script iterates over "hooks"/"package" directories and counts
directory md5 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped.
- if directory archive does not exist or md5 hash sum differs from previously counted hash
sum, archive is regenerated and saved to ""
- md5 hash of the directory is saved to .hash file in the root of "hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file in directory
or replaces entire directory.  

  was:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These files
are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among
these files there are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server. For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at cache, agent downloads
appropriate files on first use.
h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not allow to list directories.
 We have two options:
- To speed up download and avoid need to list script files explicitly, the proposal is to
pack directories "hooks" and "packages" into gz archives. 
- We may set "dirAllowed" servlet option for /resources/* and in this case agent will download
all files one by one. User will not have to run additional commands to have stack files updated
(improved usability). For every file being downloaded, a separate request will be sent. This
way to fetch files seems to be too slow, especially on big clusters.

As second way seems to be not applicable, I'm going to implement the first way. Implementation
steps:
- on server startup, python script iterates over "hooks"/"package" directories and counts
directory md5 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped.
- if directory archive does not exist or md5 hash sum differs from previously counted hash
sum, archive is regenerated and saved to ""
- md5 hash of the directory is saved to .hash file in the root of "hooks"/"package" directory.
This way, we ensure that archive is still actual if user changes some file in directory or
replaces entire directory.  


> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
>                 Key: AMBARI-4481
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4481
>             Project: Ambari
>          Issue Type: Task
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. These
files are accessible via url like http://hostname:8080/resources/jdk-6u31-linux-x64.bin .
Among these files there are service scripts, templates and hooks. Agent has a cache of these
files. Cache directory structure is similar to contents of a stacks folder at server. For
example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> └── stacks
>     └── HDP
>         ├── 2.0.7
>         │   ├── Accumulo
>         │   └── Flume
>         └── 2.0.8
>             ├── Accumulo
>             ├── Flume
>             └── YetAnotherService
> {code}
> If files for some service, component and stack version is not available at cache, agent
downloads appropriate files on first use.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does not allow to list
directories.  We have two options:
> - To speed up download and avoid need to list script files explicitly, the proposal is
to pack directories "hooks" and "packages" into gz archives. 
> - We may set "dirAllowed" servlet option for /resources/* and in this case agent will
download all files one by one. User will not have to run additional commands to have stack
files updated (improved usability). For every file being downloaded, a separate request will
be sent. This way to fetch files seems to be too slow, especially on big clusters.
> As second way seems to be not applicable, I'm going to implement the first way. Implementation
steps:
> - on server startup, python script iterates over "hooks"/"package" directories and counts
directory md5 hashes. Files and directories are listed in alphabetical order, hash sum files
and existing directory archives are skipped.
> - if directory archive does not exist or md5 hash sum differs from previously counted
hash sum, archive is regenerated and saved to ""
> - md5 hash of the directory is saved to .hash file in the root of "hooks"/"package" directory.
> This way, we ensure that an archive is still actual if user changes some file in directory
or replaces entire directory.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message