jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject [jr3] Bundle format
Date Sat, 20 Feb 2010 12:16:09 GMT
I would like to define a new storage format for nodes and properties.
A few ideas:

== Name and Namespace Index ==

Currently each new property and node name is stored in the name index.
Each namespace is stored in the namespace index. Those indexes are
used to compress the data. There are several (smaller) problems with

- The indexes are stored in properties file (non-transactional).
- In the past, there were a few problems when migrating data (copying
- Jackrabbit indexes *each* name and namespace. This can run out of
memory if there are many names (dynamically created names).
- This is a problem for clustering (specially when using the
eventually consistent model).

I would like to keep a name index mechanism for commonly used names
and namespaces, but would also support a non-indexed names / namespace
format. I think we should start with a fixed list. We could add a
mechanism to create new index entries later on.

== Node Id ==

Currently Jackrabbit uses UUIDs to identify nodes. Even nodes that are
not referenceable have UUIDs. This allows to create nodes
concurrently, which is good. It is not optimal for storage however
(index cache efficiency is very bad because the numbers are random;
size overhead). Also, it's quite in-flexible (its hard to refer to
external nodes).

For node id storage, I suggest to support multiple data types: UUID
(which is basically a fixed length or a string), long, and string. The
Jackrabbit implementation may not need to support all formats (at
least first), but the (bundle) storage format should.

== List of Parent Node Ids ==

I would store that as a (hidden) multi-value property.

== Commonly Used String ==

If we want to store node types as regular properties, we should avoid
storing the node type strings. Instead, we should store the node type
index only. This is similar to the name index and namespace index. I
suggest the storage format supports a set of indexed values (initially
a fixed list).


View raw message