incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Morel (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (S4-27) extensions to cluster configuration through Zookeeper
Date Wed, 23 Nov 2011 14:31:39 GMT

     [ https://issues.apache.org/jira/browse/S4-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthieu Morel updated S4-27:
-----------------------------

    Description: 
Applications running on S4 clusters are configured through Zookeeper.

We need to extend the current configuration properties in order to configure more features
used/required by S4 (streams, SLAs, states etc...)


Current configuration
----------------------------

It is currently limited to:
- assigning *tasks* to logical partitions (S4 nodes)
- publishing *applications*, retrievable from remote repositories

_Available tasks_, _assigned tasks_ and _applications_ are defined as _znodes_, and contain
metadata (data associated with the node), as JSON data (see ZNRecord class)

The resulting structure in Zookeeper is currently:
1. tasks
  * /<cluster-name>/tasks for available tasks
        - /<cluster-name>/tasks/Task-0 for instance represents 1 logical task, and metadata
contains the task id and the partition id
  * /<cluster-name>/process for tasks assigned to S4 nodes
        - /<cluster-name>/process/Task-0 is an ephemeral node created by an S4 node
that took the Task-0 task. Metadata contains the hostname of that S4 node
2. apps
  * /<cluster-name>/apps for applications
        - /<cluster-name>/apps/app1 for instance is the application "app1" running on
the (logical) cluster and metadata contains just the URI for fetching the S4R archive with
the application code


What we need to add
----------------------------

(just some starting points that can be seen as subtasks):

1. *nodes state*: it would be really useful to have a general view on the available S4 nodes
for a given logical cluster. In particular: what nodes are available, what is their state
(initializing, ready, stopped, processing a task, in standby,?).
--> we could use a new directory /<cluster-name>/nodes and metadata could contain
information about the node, and notably its state
--> the corresponding ephemeral znode would be maintained by the Server instance or a related
entity

2. *streams*: if we want to implement inter-app communication through streams, then streams
should be configurable through Zookeeper.
--> streams could appear in /<cluster-name>/streams
- Metadata for streams could include partitioning scheme (as suggested by Kishore in S4-10).
- Metadata could also include a key finder string
- children nodes could list applications using the stream
--> corresponding persistent znode would be created at application startup. If the stream
znode already exists, it would be reused.




  was:
Applications running on S4 clusters are configured through Zookeeper.

We need to extend the current configuration properties in order to configure more features
used/required by S4 (streams, SLAs, states etc...)


h4. Current configuration

It is currently limited to:
# assigning *tasks* to logical partitions (S4 nodes)
# publishing *applications*, retrievable from remote repositories

_Available tasks_, _assigned tasks_ and _applications_ are defined as _znodes_, and contain
metadata (data associated with the node), as JSON data (see ZNRecord class)

The resulting structure in Zookeeper is currently:
# tasks
#* /<cluster-name>/tasks for available tasks
#** /<cluster-name>/tasks/Task-0 for instance represents 1 logical task, and metadata
contains the task id and the partition id
#* /<cluster-name>/process for tasks assigned to S4 nodes
#** /<cluster-name>/process/Task-0 is an ephemeral node created by an S4 node that took
the Task-0 task. Metadata contains the hostname of that S4 node
# apps
#* /<cluster-name>/apps for applications
#** /<cluster-name>/apps/app1 for instance is the application "app1" running on the
(logical) cluster and metadata contains just the URI for fetching the S4R archive with the
application code





h4. What we need to add

(just some starting points that can be seen as subtasks):

# *nodes state*: it would be really useful to have a general view on the available S4 nodes
for a given logical cluster. In particular: what nodes are available, what is their state
(initializing, ready, stopped, processing a task, in standby,?).
--> we could use a new directory /<cluster-name>/nodes and metadata could contain
information about the node, and notably its state
--> the corresponding ephemeral znode would be maintained by the Server instance or a related
entity
# *streams*: if we want to implement inter-app communication through streams, then streams
should be configurable through Zookeeper.
--> streams could appear in /<cluster-name>/streams
#* Metadata for streams could include partitioning scheme (as suggested by Kishore in S4-10).
#* Metadata could also include a key finder string
#* children nodes could list applications using the stream
--> corresponding persistent znode would be created at application startup. If the stream
znode already exists, it would be reused.




    
> extensions to cluster configuration through Zookeeper
> -----------------------------------------------------
>
>                 Key: S4-27
>                 URL: https://issues.apache.org/jira/browse/S4-27
>             Project: Apache S4
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Matthieu Morel
>             Fix For: 0.5
>
>
> Applications running on S4 clusters are configured through Zookeeper.
> We need to extend the current configuration properties in order to configure more features
used/required by S4 (streams, SLAs, states etc...)
> Current configuration
> ----------------------------
> It is currently limited to:
> - assigning *tasks* to logical partitions (S4 nodes)
> - publishing *applications*, retrievable from remote repositories
> _Available tasks_, _assigned tasks_ and _applications_ are defined as _znodes_, and contain
metadata (data associated with the node), as JSON data (see ZNRecord class)
> The resulting structure in Zookeeper is currently:
> 1. tasks
>   * /<cluster-name>/tasks for available tasks
>         - /<cluster-name>/tasks/Task-0 for instance represents 1 logical task,
and metadata contains the task id and the partition id
>   * /<cluster-name>/process for tasks assigned to S4 nodes
>         - /<cluster-name>/process/Task-0 is an ephemeral node created by an S4
node that took the Task-0 task. Metadata contains the hostname of that S4 node
> 2. apps
>   * /<cluster-name>/apps for applications
>         - /<cluster-name>/apps/app1 for instance is the application "app1" running
on the (logical) cluster and metadata contains just the URI for fetching the S4R archive with
the application code
> What we need to add
> ----------------------------
> (just some starting points that can be seen as subtasks):
> 1. *nodes state*: it would be really useful to have a general view on the available S4
nodes for a given logical cluster. In particular: what nodes are available, what is their
state (initializing, ready, stopped, processing a task, in standby,?).
> --> we could use a new directory /<cluster-name>/nodes and metadata could contain
information about the node, and notably its state
> --> the corresponding ephemeral znode would be maintained by the Server instance or
a related entity
> 2. *streams*: if we want to implement inter-app communication through streams, then streams
should be configurable through Zookeeper.
> --> streams could appear in /<cluster-name>/streams
> - Metadata for streams could include partitioning scheme (as suggested by Kishore in
S4-10).
> - Metadata could also include a key finder string
> - children nodes could list applications using the stream
> --> corresponding persistent znode would be created at application startup. If the
stream znode already exists, it would be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message