Return-Path: X-Original-To: apmail-jackrabbit-commits-archive@www.apache.org Delivered-To: apmail-jackrabbit-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA91B108A0 for ; Wed, 4 Sep 2013 00:42:22 +0000 (UTC) Received: (qmail 81563 invoked by uid 500); 4 Sep 2013 00:42:22 -0000 Delivered-To: apmail-jackrabbit-commits-archive@jackrabbit.apache.org Received: (qmail 81532 invoked by uid 500); 4 Sep 2013 00:42:22 -0000 Mailing-List: contact commits-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list commits@jackrabbit.apache.org Received: (qmail 81525 invoked by uid 99); 4 Sep 2013 00:42:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 00:42:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 00:42:18 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 41CF823C for ; Wed, 4 Sep 2013 00:41:57 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 04 Sep 2013 00:41:57 -0000 Message-ID: <20130904004157.72705.64084@eos.apache.org> Subject: =?utf-8?q?=5BJackrabbit_Wiki=5D_Trivial_Update_of_=22JackrabbitFileVaultF?= =?utf-8?q?S=22_by_TobiasBocanegra?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" fo= r change notification. The "JackrabbitFileVaultFS" page has been changed by TobiasBocanegra: https://wiki.apache.org/jackrabbit/JackrabbitFileVaultFS New page: '''''work in progress''''' ---- <> =3D=3D Introduction =3D=3D we see in various applications the need for a simple jcr repository to file= system mapping. for example in source management tools, fileserver bindings= , import/export stuff etc. if a jcr repository would only consist of `nt:fi= le` and `nt:folder`, this would be easy. but if other nodetypes are used (e= ven a simple as extending from `nt:file`) the mapping to the filesystem is = not so trivial anymore. the idea is to provide a general all-purpose mechan= ism to export to and import from a standard (java.io based) filesystem. The !VaultFs is designed to provide a general filesystem abstraction of a J= CR repository. It provides the following features: intuitive mapping:: A `nt:file` should just map to a simple file, a `nt:fo= lder` to a directory. More complex node types should map to a `nodename.xml= ` and a possible `nodename` folder that contains the child nodes or be aggr= egated to a complete or partial serialization. universal api:: the api should be suitable for all filesystem based applic= ations like WebDAV, CIFS, SCM Integration, FileVault, etc. extendable:: A plugin mechanism should allow to extend the mapping layer f= or further conversions filters and aggregators. =3D=3D Overview =3D=3D !VaultFs consists mainly of 2 layers that map the repository's nodes to !Va= ultFs files: The '''Aggregate Node Tree''' that is managed by the ''aggrega= te manager'' represents a hierarchical view of the content aggregates. Each= aggregate is addressed by a path and allows access to its artifacts. The a= rtifacts nodes are built using ''aggregators'' that define which repository= items belong to an aggregate and what artifacts they produce. For each art= ifact there is a ''serializer'' defined that is used to export and import t= he respective content. = On top of the aggregate tree is the '''Vault File System''' that accesses t= he aggregates and exposes them as tree of ''vault files''. They can be used= to export and import the actual repository content. The mapping from aggre= gates and its artifacts to vault files is done in an intuitive way so that = clients (and users) can deal with them in a natural filesystem like fashion. {{%topic.attachments%/vault_sample.png|Example Tree}} =3D=3D Aggregate Manager =3D=3D The aggregate manager is configured with a set of aggregators and serialize= rs. Once the manager is mounted on a jcr repository it exposes a tree of ag= gregates. They are collected using an aggregator that matches the respectiv= e repository node. For example the ''nt:file aggregator'' produces an artif= acts node that allows no further child nodes and provides (usually) one pri= mary artifact (which represents the content of the file). =3D=3D=3D Artifacts =3D=3D=3D an artifact is one aspect or part of a content aggregation. the following a= rtifact types exist: * Directory Artifacts * File Artifacts * Primary Artifacts * Binary Artifacts '''Directory''' artifacts represent the folder aspect of an aggregate. For = example a pure =3Dnt:folder=3D would produce an aggregate with just one sol= e directory artifact. '''File''' artifacts represent file aggregates. since the `nt:file` handlin= g is very special there is an special type for it. '''Primary''' artifacts represent the main aggregate. This usually contains= all nodes and properties that belong to the aggregate that cannot be expre= ssed by another type. '''Binary''' artifacts represent binary content that is not included in the= primary or file artifacts. This is for example suitable for binary propert= ies that were not included in a xml deserialization. This allows keeping th= e deserializations leaner and more efficient. =3D=3D Content Aggregation =3D=3D A subtree of nodes will be aggregated semantically into one entity, the agg= regate. This mainly consists of a path and a set of artifacts and may have = child aggregates. the mechanism how content aggregation works is defined by a set of '''filte= rs''' with corresponding '''aggregators'''. if we look at the export in a r= ecursive way, it would work as follows: # traverse the repository starting at the root node # for each node check which filter matches # execute the respective aggregator and create a new aggregate # if aggregator allows child nodes descend into the excluded nodes =3D=3D=3D Aggregates =3D=3D=3D an aggregate is a tree of repository items that belong together and are map= ped to (a set of) artifacts. the artifacts represent filesystem resources. = the aggregate type is defined by the aggregator type and not primarily by t= he content. i.e. the selected aggregator must return stable coverage inform= ation which is not dependent of the actual content. there can be identified 4 types of aggregates. =3D=3D=3D=3D Full coverage aggregates =3D=3D=3D=3D they aggregate an entire subtree. for example the complete serialization of= a `nt:nodeType` node or a ''dialog definition''. they are very simple to d= eal with, since the root node of the aggregate is usually serialized into 1= filesystem file. The following repository structure: {{{ + nodetypes [nt:unstructured] + nt1 [nt:nodeType] + jcr:propertyDefinition [nt:propertyDefinition] + jcr:propertyDefinition [nt:propertyDefinition] + jcr:childNodeDefinition [nt:childNodeDefinition] + nt2 [nt:nodeType] ... }}} could be mapped to: {{{ `- nodetypes |- nt1.cnd `- nt2.cnd }}} =3D=3D=3D=3D Generic aggregates =3D=3D=3D=3D generic aggregates cover a part of a content subtree, hence they have not a= full coverage. they always consist at least of a primary artifact and a di= rectory artifact. examples of those are the aggregation of a `cq:Page` stru= cture or of `nt:unstructured` nodes. = the following repository structure: {{{ + en [cq:Page] + jcr:content [cq:Content] + about [cq:Page] + jcr:content [cq:Content] + header [cq:Content] + image.jpg + solutions [cq:Page + jcr:content [cq:Content] }}} are mapped to: {{{ `- en |- .content.xml |- about | |- _jcr_content | | `- header | | `- image.jpg | `- .content.xml `- solution `- .content.xml }}} the example above just excluded some direct child nodes of the aggregate ro= ot from the aggregation (with the exception of the `image.jpg` node). but t= his could be more complicated. overlapping example: {{{ + apps [nt:unstructured] + example [nt:unstructured] + components [nt:unstructured] + image [cq:Component] + dialog [cq:Dialog] ... = + default.jsp [nt:file] = }}} is be mapped to: {{{ `- apps |- .content.xml `- example |- .content.xml `- components |- .content.xml `- image |- .content.xml |- dialog.xml `- default.jsp }}} this example has 6 aggregates: # the generic aggregate for `apps` # the generic aggregate for `example` # the generic aggregate for `components` # the generic aggregate for `image` # the `default.jsp` file aggregate # the `dialog.xml` full coverage aggregate =3D=3D=3D=3D Simple File aggregates =3D=3D=3D=3D since files (`nt:file` nodes and extents) are common they are treated diffe= rently in aggregation. the simplest mapping is to create a filesystem file = for each `nt:file`. unfortunately there is some information in a default `n= t:file` that cannot be preserved in the filesystem. namely: * `jcr:created` property * `jcr:content/jcr:uuid` property * `jcr:content/jcr:encoding` property * `jcr:content/jcr:mimeType` property so in order to achieve a complete serialization there is an extra artifact = needed to store this info. but to keep the mapping lean, those properties are not part of the file agg= regate but 'delegated' to its parent aggregate. example: {{{ + foo [nt:folder] + example.jsp [nt:file] - jcr:created ... + jcr:content [nt:resource] - jcr:data - jcr:lastModified - jcr:mimeType }}} is mapped to: {{{ `- foo |- .content.xml `- example.jsp }}} the `.content.xml` will include the properties that are not handled by the = `example.jsp` =3D=3D=3D=3D Extended File aggregates =3D=3D=3D=3D when `nt:file` nodes are extended, either by primary or mixin type, the pri= mary artifact remains the generic serialization of the resource. additional= information needs to be serialized to an extra artifact. example: {{{ + sample.jpg [dam:file] - jcr:created + jcr:content [dam:resource] - jcr:lastModified + dam:thumbnails [nt:folder] - 90.jpg [nt:file] - 120.jpg [nt:file] }}} are be mapped to: {{{ |- sample.jpg `- sample.jpg.dir |- .content.xml `- _jcr_content `- _dam_thumbnails |- 90.jpg `- 120.jpg }}} =3D=3D=3D=3D Folder aggregates =3D=3D=3D=3D pure `nt:folder` aggregates will result in one directory and mostly in an a= dditional `.content.xml` =3D=3D=3D=3D Binary Properties =3D=3D=3D=3D There is some special handling for binary properties other than `jcr:data` = in a `jcr:content` node. = example (although this is probably very rare): {{{ + foo [nt:unstructured] + bar [nt:unstructured] + 0001 [nt:unstructured] - data1 (binary) - data2 (binary) + 0002 [nt:unstructured] - data1 (binary) - data2 (binary) }}} is mapped to: {{{ `- foo |- .content.xml `- bar |- 0001 | |- data1.bin | `- data2.bin `- 0002 |- data1.bin `- data2.bin = }}} =3D=3D=3D=3D Resource Nodes =3D=3D=3D=3D there are some cases where `nt:resource` like structures are used that are = not held below a `nt:file` node. {{{ + foo [nt:unstructured] + cq:content [nt:resource] - jcr:mimeType "image/jpg" - jcr:data = - jcr:lastModified }}} this is mapped to: {{{ `- foo |- .content.xml `- _cq_content.jpg }}} where as the mimetype and modification date can be recorded in the primary = artifact. possible other properties like `jcr:uuid` etc would go to the par= ent aggregate. =3D=3D=3D=3D Filename escaping =3D=3D=3D=3D not all of the character in a jcr name are allowed filesystem characters an= d need escaping. the normal case is to use the 'url encoding', i.e. using a= `%` followed by the hexnumber of the character. but this look ugly, especi= ally for the colon `:`, eg a `cq:content` would become `cq%3acontent`. so f= or the namespace prefix there is a special escaping by replacing it by a un= derscores. eg: `cq:content` will be `_cq_content`. nodes already having thi= s patter will be escaped using a double underscore. eg: `_test_image.jpg` w= ould be `~__test_image.jpg`. more examples: ||'''node name'''|'''file name'''|| || `test.jpg` || `test.jpg` || || `cq:content` || `_cq_content` || || `test_image.jpg` || `test_image.jpg` || || `_testimage.jpg` || `_testimage.jpg` || || `_test_image.jpg` || `__test_image.jpg` || || `cq:test:image.jpg` || `_cq_test%3aimage.jpg` ^1^ || '''^1^''' this is a very rare case and justifies the ugly `%3a` escaping. =3D=3D Serialization =3D=3D The serialization of the artifacts is defined by the =3Dserializer=3D that = is provided by the aggregator. Currently there are only 3 kind of serializa= tions used: a direct data serialization for the contents of file or binary = artifacts, a ''CND'' serialized for `nt:nodeType` nodes and an enhanced ''d= ocview'' serialization for the rest. The ''docview'' serialization that is = used allows multi value properties (and might be enhanced by a better prope= rty type support). =3D=3D Deserialization =3D=3D Although for exporting only 3 serialization types are used this is a bit di= fferent for importing. The importer analyzes the provided input sources and= determines the following serialization types: * generic XML * docview XML * sysview XML * generic data Depending on the configuration those input sources can be handled different= ly. currently they are imported as follows: '''generic XML''' produces a `nt:file` having a `jcr:content` of the deseri= alization of the xml document (if importing into CRX then the `crx:XmlDocum= ent` nodetypes and friends are used). '''docview XML''' is more or less imported directly below the respective im= port root. '''sysview XML''' is more or less imported directly below the respective im= port root. '''generic data''' produces a `nt:file` having the data as `nt:resource` co= ntent. =3D=3D Vault File System Layer =3D=3D The !VaultFs layer provides a mapping from the aggregate tree to a file sys= tem. The goal is to keep the amount of files lean and as natural as possibl= e with a minimum amount of extra files. = =3D=3D Terminology =3D=3D ~VaultFs:: The File Vault Filesystem. Provides file-like abstraction of a = JCR repository. ~VaultFile:: A ~VaultFs entity that represents a file-like abstraction of = a (partial) repository node tree. Aggregate:: Represents an addressable collection of artifacts. Aggregator:: Interface that defines the methods for building content aggre= gates. Artifact:: Representation of a content aggregate. An aggregator can provid= e several artifacts. An artifact is either mapped to a file or a directory = and can be of the type: * primary * file * binary * directory Serializer:: Interface that defines the methods for serializing an artifac= t. Artifact handler:: Interface that defines methods for deserializing artifa= cts.