jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Nuescheler" <david.nuesche...@gmail.com>
Subject Re: NGP: Value records
Date Tue, 24 Apr 2007 10:26:11 GMT
hi jukka,

i am very much in favor of such an approach.

> My idea is to store each value in a unique and immutable "value
> record" identified by a "value identifier". Duplicate values are only
> stored once in a single value record. This saves space especially when
> storing multiple copies of large binary documents and allows value
> equality comparisons based on just the identifiers.
this sounds great for large (binary and string) property values.

> A value record would essentially be an array of bytes as defined in
> Value.getStream(). In other words the integer value 123 and the string
> value "123" would both be stored in the same value record. More
> specific typing information would be indicated in the property record
> that refers to that value. For example an integer property and a
> string property could both point to the same value record, but have
> different property types that indicate the default interpretation of
> the value.
i think that with small values we have to keep in mind that the
"key" (value identifier) may be bigger than the actual value and of
course the additional indirection also has a performance impact.
do you think that we should consider a minimum size for value's to
key stored in this manner? personally, i think that this might make
sense.
anyway, what key did you have in mind?
i would assume some sort of a hash (md5) could be great or is this
still more abstract?

> Name and path values are stored as strings using namespace prefixes
> from an internal namespace registry. Stability of such values is
> enforced by restricting this internal namespace registry to never
> remove or modify existing prefix mappings, only new namespace mappings
> can be added.
sounds good, i assume that the "internal" namespace registry gets
its initial prefix mappings from the "public" namespace registry?
i think having the same prefixes could be beneficial since remappings
and removals are very rare even in the public registry and this would
allow us to optimize the more typical case even better.

> Achieving uniqueness of the value records requires a way to determine
> whether an instance of a given value already exists. Some indexing is
> needed to avoid having to traverse the entire set of existing value
> records for each new value being created.
i agree and i think we have to make sure that the overhead
of calculating the key (value identifier) is reasonable, so
"insert performance" doesn't suffer too much.
i could even see an asynchronous model that "inlines" values
of all sizes initially and then leaves it up to some sort of garbage
collection job to "extract" the large values and stores them as
immutable value records...
this could preserve "insert performance" and allows to benefit from
efficient operations for things like copy, clone, etc and of course the
space consumption benefits.

so i guess in short i would be in favor of a value mechanism that can
handle transparently both (a) "inline" the values without using extra
indirection (for small values or quickly inserted one) and
(b) immutable value records.

just my two cents.

regards,
david

Mime
View raw message