cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject [RT] Implementing Cocoon Blocks
Date Sun, 17 Aug 2003 17:00:27 GMT
This is a collection of (more or less) random thoughts about the 
implementation of Cocoon Blocks that I collected while talking with 
Ricardo and Sylvain IRL.

Please note that anything proposed here, while organic and workable, is 
not to be considered carved in stone, but rather a suggestion on how to 
move forward.

                                      - o -

Design Constraints
------------------

1) impact on back compatibility should be minimal, optimally none. that 
is: everything that worked before the introduction of blocks should 
continue to work with no required changes [this will reduce migration 
issues]

2) the implementation should be incremental and evolutionary. no 
radical changes in the cocoon architecture should be created [this will 
reduce the amount of code to write and also provide better regression]

3) the CVS tree should be buildable at all times [this will be enforced 
by an evolutionary approach to the implementation]

4) security of the architecture for block managing and deploying is a 
*TOP* priority and should be introduced up front.

5) deployment should be system administrator-friendly. that is, should 
*NOT* require GUIs or webapps (even if it should allow them to be 
possible)

                                      - o -

The overall architecture
------------------------

Let's start with the first requirement: security.

Blocks are functional components at the webapp level. If a user is able 
to change the block wiring, the user is, potentially, able to execute 
his/her own code with the same security level of the entire cocoon 
application.

For this reason, the block wiring information should be located in a 
configuration file that is "read-only" by cocoon and "read/write" by 
the block deployer.

   +--------+                          +----------------+
   | cocoon | <--- [File System] <---> | block deployer |
   +--------+                          +----------------+

Note that the block deployer *could* be anything (a CLI, a webapp, an 
eclipse plugin). The above meets our second requirement: user 
friendlyness for all types of users.

Also note that it meets, potentially, the ability for the system 
administrators to perform actions such as 'staging' and 'cluster 
replication' by simply performing a file copy. Cocoon should be able to 
reload the block wiring information if this is changed.

In order to improve security and avoid DoS, there is *no way* for the 
block deployer to signal directly information to the cocoon instance 
(and no way for the cocoon instance to modify the wiring information or 
to communicate directly with the block deployer). Everything is 
performed thru the use of the file system.

The block deployer
------------------

The block deployer architecture is the following

           +--------------------+     +------------------+
           | +-----------+      |     |                  |
  [FS] <---->| FS Driver |      | <-> |  User Interface  |
           | +-----------+      |     |                  |
           |                    |     +------------------+
           |      block         |
           |    services        |
           |                    |
           |  +---------+       |
           | +---------+|       |
           | | Locator |+       |
           | +---------+        |
           +------^-------------+
                  |
                  V
             block library

which is composed by four main parts:

  1) the file system driver: the part responsible for reading/writing 
the block wiring information and block configurations, to extract the 
files from the blocks distrubution archives and physically deploy the 
extracted files on the file system. There is no need for polymorphism 
for this part since there needs to be a solid file system contract 
between this driver and the cocoon block manager (included inside 
cocoon) which will need to read the block wiring info and locate the 
files on the file system.

  2) the block locator: the part responsible for locating the metadata 
associated with a given block identifier and thus, provide enough data 
for the block services and the user interface to drive the installation 
process. This part needs polymorphism. Potential implemenation of this 
locator are:

    a) "file system"-based locator: the block metadata and location 
information is stored in a file on disk.

    b) "network service"-based locator: the block metadata is provided 
by a network service (for example, a web service).

The block deployer can use multiple locators at the same time, in a 
cascading way: it should be possible to configure the block deployer 
with the kind of location services and provide a priority for which one 
to use. This allows, for example, to provide an architecture for block 
discovery that could work like this:

   block deployer ---> company block library -> cocoon official library

[a collection of blocks is called a "block library". the application 
that, given a block identifier, looks up its metadata is called "block 
librarian".]

  3) the block services: the part that is shared by all potential block 
deployers (no matter how the user inteface is implemented).

  4) the user interface: the part that is driving the block services but 
it's dependent on the user interface.

                                      - o -

The Block Manager
-----------------

The block manager is the part that is responsible for handling the 
block wiring information. This is included inside cocoon and it can 
read and interpret the block wiring information written by the block 
deployer.

The block manager is the only part of cocoon that knows how block are 
wired together and where their actual location on disk is.

The block manager will be queried by all the cocoon internal services 
that need to locate block-dependent stuff, that is:

   1) the sitemap interpreter: to find out where the blocks sitemaps are 
mounted in the main sitemap URL space
   2) the block: protocol: to locate the services provided by the blocks
   3) the component manager: to locate components provided by the blocks 
(either avalon components, sitemap components and virtual components)

                                      - o -

File System Layout and wiring data
----------------------------------

Let us suppose we have the following blocks that are deployed in our 
system

   cob:mycompany.com/webmail/1.3.43
    has a sitemap located on -> /webmail.xmap
    depends on -> cob:mycompany.com/skin
      names this dependency -> external-skin
    depends on -> cob:mycompany.com/skin/2.0
      names this dependency -> internal-skin
    depends on -> cob:anothercompany.com/MailRepository/2.0
      names this dependency -> repository
      uses component -> "com.anothercompany.repository.Repository"
        names this component with role -> repository
    requires the configurations:
      "user" of type string with no default
      "password" of type string with no default

   cob:yetanothercompany.com/skins/fancy/1.2.2
     implements -> cob:mycompany.com/skin/1.2

   cob:mycompany.com/skins/corporate/34.3.345
     implements -> cob:mycompany.com/skin/2.3
     extends -> cob:yetanothercompany.com/skins/fancy/1.2.2

   cob:mycompany.com/repositories/email/exchange/3.2.1
     implements -> cob:anothercompany.com/MailRepository/2.0
     exposes component -> "com.anothercompany.repository.Repository"
     requires the configurations:
      "host" of type string, with default "127.0.0.1"

the above information is extracted from the block metadata included 
inside the blocks themselves and is deployment independent (also, the 
deployment process cannot modify these properties)

The deployment process added the mounting, wiring and configuration 
information

  cob:mycompany.com/webmail/1.3.43
   located at -> WEB-INF/blocks/384938958499
   mounted on -> /mail/
   "external-skin" -> cob:yetanothercompany.com/skins/fancy/1.2.2
   "internal-skin" -> cob:mycompany.com/skins/corporate/34.3.345
   "repository" -> cob:mycompany.com/repositories/email/exchange/3.2.1
   configured as:
    user -> "guest"
    password -> "sj3u493"

  cob:mycompany.com/repositories/email/exchange/3.2.1
   located at -> WEB-INF/blocks/394781274834
   configured as:
     host -> "mail.blah.org"

  cob:yetanothercompany.com/skins/fancy/1.2.2
   located at -> WEB-INF/blocks/947384127832

  cob:mycompany.com/skins/corporate/34.3.345
   located at -> WEB-INF/blocks/746394782637

the file system layout (relative to the cocoon webapp context) is

    [-] WEB-INF
     L___ [-] blocks
           L___ wiring.xml
           L___ [-] 384938958499
           |     L___ [-] BLOCK-INF
           |     |     L___ block.xml
           |     L_ (the contents of cob:mycompany.com/webmail/1.3.43)
           L___ [-] 947384127832
           |     L___ [-] BLOCK-INF
           |     |     L___ block.xml
           |     L_ (the contents of 
cob:yetanothercompany.com/skins/fancy/1.2.2)
           L___ [-] 746394782637
           |     L___ [-] BLOCK-INF
           |     |     L___ block.xml
           |     L_ (the contents of 
cob:mycompany.com/skins/corporate/34.3.345)
           L___ [-] 394781274834
                 L___ [-] BLOCK-INF
                 |     L___ block.xml
                 L_ (the contents of 
cob:mycompany.com/repositories/email/exchange/3.2.1

where

  wiring.xml contains the block IDs (which also identifies their 
location on disk) wiring, mounting and configurations.

  block.xml contains the block metadata (which belong to the block and 
cannot be changed at deployment time).

NOTE: if the location path of the block is relative, it is searched by 
starting from the cocoon war context. The block content is *always* 
extracted from the archives and saves "as is" inside the folder.

NOTE (development time): in order to simplify block creation and 
development, it will be possible to explicity indicate the location of 
an already existing and extracted block implementation on disk. The 
block manager should also have autoreloading features (configurable, of 
course) that should reload the configurations, the wiring and the 
exposed components when they are changed.

                                         - o -


Issues that were still unsolved
-------------------------------



1) block identification

All blocks (behaviors and implementations) are identified by a URI. the 
format of the URI is as follows:

      cob:organization/name/x.y(.z)

where

   cob: is a virtual protocol that is used instead of http:// to avoid 
the problem of mistaking the URI for a URL

   "organization" is the unique identifier for the organization that is 
responsible for the maintenance of that identifier. the ICANN domain 
name should be used [for example, apache.org for the ASF and so on]

   "name" is the unique name of the identifier. it is suggested that a 
path delimiter is used to further specialize the name (see belows for 
examples)

   x.y.z is the version identifier

    x -> major (>= 1)
    y -> minor (>= 0)
    z -> bugfix (>= 0) (only for implementations)

NOTE: identifiers are case insensitive.

Example of good identifiers are

   cob:apache.org/cocoon/PDF/2.6
   cob:apache.org/cocoon/Fop/3.4.34
   cob:apache.org/cocoon/iText/1.0.43
   cob:mycompany.com/mydepartment/myself/myblock/3.2.23

example of bad identifiers

   cob:cocoon.apache.org/whatever/2.3.434

the use of the virtual host instead of the domain name should be 
avoided because it mixes location and identification concerns.

   cob:apache.org/cocoon/block/whatever/2.3.4

the inclusion of the "block" name should be avoided because redundant 
(the cob virtual protocol was introduced exactly to specify block 
specificity and avoid location and identification semantic collisions)

   cob:apache.org/cocoon/PDF/Fop/2.3.43

information of what behavior is implemented by a given block 
implementation should not be included in the identifier.




  2) dependency ranges

When a block implementation depends on another block (either 
implementation or behavior), it should be able to have an 'elastic' 
dependency which doesn't connect it to the versioned identifier, but to 
a range of those versions.

Instead of explicitly indicate the range description language, it is 
suggested to implicity describe range rules. These implicit range rules 
are:

  a) if the dependency doesn't include the version, all versions are 
matched

   ex: both "cob:apache.org/blah/1.0" and "cob:apache.org/blah/3.43.342" 
are matched by "cob:apache.org/blah"

  b) if the dependency includes a version, versions are matched with the 
following rules

    i) if major is equal
    ii) if minor is greater or equal
    iii) in case of implementations and if minor is equal, if bugfix is 
greater or equal

   ex: depending on "cob:apache.org/blah/2.0.34" will match

         - cob:apache.org/blah/2.0.345
         - cob:apache.org/blah/2.3.23

but not

         - cob:apache.org/blah/1.0.0
         - cob:apache.org/blah/34.323.324534



  3) persistent service behavior with hot deployment

One of the big issues with hot deployment is the potentially 
inconsistent state of the persistent services contained by one block 
and used by another when the providing block is redeployed.

The issue is easily solvable for block services provided via sitemap by 
imposing them as stateless services (or REST-like, by passing all the 
required information every time).

The problem appears evident for component instances.

It is suggested that blocks don't allow direct classloading between 
blocks, but that only components exposed in the block deployment 
descriptor will be made available to other blocks. This way, all the 
dependencies are known because all the component loading happens thru 
the Block Manager and the block manager is able disposte and 
reinstantiate all the blocks that contain instances of components that 
are in an inconsistent state.

While it is possible to write a classloader which is smart enough to do 
the above even for transparent classloading (say, loading via "new 
Blah()" instead of via cocoon.getComponent("Blah")), it is suggested to 
disallow direct classloading to avoid creating hidden contracts between 
blocks.



   4) block mounting

Some blocks are meant to be publicly accessible and, for this reason, 
they can be "mountable" onto a particular location of the URL space 
handled by Cocoon.

Such mounting will be "implicit", meaning that the main cocoon sitemap 
will not be modified by the block deployer.

This means that, in order to achieve, back compatibility, when a block 
is deployed on cocoon, the sitemap interpreter asks the block manager 
whether or not there is some mounted block that matches the incoming 
request, if so, that block is invoqued, otherwise, it falls back on the 
main sitemap.

This implies that it's entirely possible that a block "obscures" 
pipelines located in the mail cocoon sitemap (or subsitemaps mounted 
the direct way in there), but it is suggested that the sitemap 
interpreter doesn't fallback to the main sitemap if the block sitemap 
is invoqued, but no matching pipeline is located. This is to avoid 
potentially dangerous (security-wise) holes in the block URL-space 
covering that could lead to hard to forecast issues.

This means that the sitemap interpreter should:

  check with the block manager if a block matches the request
  if so, pass the request to the block that is mounted in that location
      if not pipeline matches the request in that block, trigger a 404
  if no block is mounted on that location, invoque the cocoon main 
sitemap



   5) block configuration at deployment time

blocks will contain configurations that is written at block-release 
time but there are information that are deployment dependent. The block 
deployment descriptor contains a list of those configurations that are 
required to be entered at deployment time.

Since these configurations will rather be context-dependent tokens, 
these can be considered more as properties. An example of a descriptor 
could be:

  <properties>
   <property name="username">
    <default>guest</default>
    <description>The name of the user</description>
   </property>
   ...
  </properties>

then, these values will be accessible in the usual block.xconf using 
{name} style. For example

...
<datasources>
  <datasource name="rbdms">
   <username>{username}</username>
   ...
  </datasource>
</datasources>
...

                                      - o -

Implementation Phases
---------------------

Phase 1: definition of the contract between the block manager inside 
cocoon and the standalone block deployer. These contracts include:

  1) description of the file system layout (see above for a suggestion)
  2) description of the wiring document schema
  3) description of the block metadata schema

Phase 2: definition and implementation of the block data model, with 
reading/writing capabilities

  1) implementation of the block wiring data model
  2) implementation of the xml -> data model parser
  3) implementation of the data model -> xml serializer

NOTE: since the xml formats are *not* meant to be human editable, 
roundtripping of comments or formatting included in those xml files 
should not be a priority.

At this point, implementation can work parallel:

Phase 3 - cocoon side: implementation of block support.

This phase includes:

  3a) implementation of the BlockManager
  3b) implementation of the block: protocol handler
  3c) implementation of the link transformer
  3d) implementation of the reload watchdog

[note: the link transformer has to be "block" aware in order to 
identify where other blocks are mounted]

NOTE: during this phase, development can happen with a handwritten and 
extracted block wiring info and block descriptors.

Phase 3 - deployer side: definition of the interfaces between the 
components:

   3a) the Locator interface
   3b) the Block services interfaces

Phase 4 - deployer side: implementation of a basic block deployer

   4a) implementation of the block services
   4b) implementation of a "file system"-based locator
   4c) implementation of a command-line user interface

Phase 5 - deployer side: implementation of a webservice block librarian

   5a) implementation of a REST-style web service locator
   5b) implementation of a cocoon block that implements block librarian 
capabilities

                                     - o -

Awaiting for your comments.

--
Stefano.


Mime
View raw message