jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject Re: [jr3] Bundle format
Date Tue, 23 Feb 2010 19:19:37 GMT
Hi,

I made a prototype for the Bundle and Value classes:
http://h2database.com/p.html#2f2ad854cbadd3b3c3676f6e02dc8058

A few remarks:

The Bundle class uses variable size integers / longs heavily. This
saves quite a lot of space. This is similar to SQLite, Protocol
Buffers, and H2. Some code is from
http://code.google.com/p/h2database/source/browse/trunk/h2/src/main/org/h2/store/Data.java

String are written almost like UTF-8 (this implementation is a bit
faster). Doubles are stored in IEEE 754 format, but bytes reversed,
and variable size. Short strings (0-15 characters) use one byte more
than the length. Date, decimal, references and so on are stored as
strings. Small binaries are stored in-place, for larger ones only the
reference. For "indexed values", only the index is stored (one or two
bytes).

Value class: There is a hardcoded list of "indexed values". The Value
class that supports multi-valued "values" ("array-of-values value").
There is a value cache for frequently used values that are not indexed
(to save memory - this idea is also from H2). Node names and property
names are also "Values" (actually two values for each name: the
namespace, plus the local name). The value class has 'deep' hashCode
and equals implementations, and is comparable.

A node could be stored as one "Value" instance, in the form of a
nested "array-of-values value". I will propose a node serialization
later on, but it's simply an array of properties (the child node list
is also a multi-valued property; everything is a property). Actually,
you could even store a tree of nodes as one "Value".

> We might also consider not storing each bundle in a separate record,
> instead making a single record of all the bundles included in a
> ChangeLog instance. Such super-bundles could perhaps be persisted in
> the data store.

Yes, that should be possible. Currently we keep the change log in
memory until the user calls "save". This is a problem for large
transactions (out of memory). I suggest to change this, so the system
(possibly also the user) can do some "temporary saves" that are not
committed yet. When using a database backend, this wouldn't keep a
database transaction open; instead it would store the changelog to a
new place ("new version") and then on Session.save() it would copy the
data over to the "main storage". More about that later on.

Regards,
Thomas

Mime
View raw message