jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Kurla" <stefan.ku...@gmail.com>
Subject Re: understanding jackrabbit datastorage
Date Fri, 27 Apr 2007 19:26:11 GMT
Thanks it does help.


On 4/27/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
>
> On 4/27/07, Stefan Kurla <stefan.kurla@gmail.com> wrote:
> > I guess this is more suited for the dev list.
>
> Yep.
>
> > How is the data actually stored in jackrabbit say using mysql for
> > example and we are just using the default workspace.
>
> A good starting point in understanding the underlying storage model of
> Jackrabbit is to look at the PersistenceManager interface [1]. The
> actual physical storage model depends on the persistence manager
> implementation you are using, but the logical model is fixed by the
> interface.
>
> The PersistenceManager abstraction essentially treats all nodes and
> properties as individually addressable items that each have their own
> unique identifier. In addition to these items the interface also
> defines a mechanism to store and access all the references pointing to
> a node.
>
> > There is the default_binval which has binval_id and binval_data.
> > ### Is this table used to store binary data, where binval_id is the
> > uuid of the jcr:content that this is referring to and binval_data is
> > the actual bytestream blob data
>
> Yes, the binval table stores binary properties when the externalBLOBs
> configuration option is set to "false".
>
> The binval_id column contains the property identifier plus value index
> (because of multivalued properties) used to identify the binary value,
> and the binval_data column contains the actual byte stream.
>
> > There is default_node which has node_id and node_data.
> > ###How is this used?
>
> The node_id column contains the unique node identifier and the
> node_data column contains the node state in a serialized format [2].
>
> > default_prop with prop_id and prop_data
> > ###How is this used?
>
> The prop_id column contains the property identifier, and the prop_data
> column contains the property state in a serialized format [2].
>
> > default_refs with node_id and refs_data
> > ###How is this used?
>
> The node_id contains the identifier of the reference target node, and
> the refs_data contains the list of referencing property identifiers in
> a serialized format [2].
>
> > Say the structure is
> > /
> > --folderA:nt:folder (propertyX:references fileB)
> > ----fileA:nt:file
> > --fileB:nt:file
> > [...]
> > My question then is how would the database store the uuids or nodes of
> > the structure that is defined above. Very simple structure but to
> > understand how this structure is actually translated to be stored in
> > the database would be helpful.
>
> You'd have four node rows: the root node, folderA, fileA, and fileB.
> The serialized node_data part of the root and folderA nodes would
> contain the node identifiers of the child nodes  (folderA and fileB
> for the root node, and fileA for folderA).
>
> All properties would be stored in the property table. Additionally the
> reference from propertyX to fileB would be stored as a separate refs
> row with the fileB UUID as the node_id value and a serialized property
> identifier list that contains just the propertyX identifier as the
> refs_data value.
>
> I hope this description helps. Note that this only applies to the
> traditional database persistence managers. The new bundle persistence
> managers in Jackrabbit 1.3 work a bit differently, though the same
> identifier->data structure is still in use.
>
> BR,
>
> Jukka Zitting
>
> [1] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/PersistenceManager.html
> [2] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/util/Serializer.html
>

Mime
View raw message