jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias Bocanegra" <tobias.bocane...@day.com>
Subject Re: NGP: Value records
Date Sat, 28 Apr 2007 08:54:55 GMT
> > > A value record would essentially be an array of bytes as defined in
> > > Value.getStream(). In other words the integer value 123 and the string
> > > value "123" would both be stored in the same value record. More
> > > specific typing information would be indicated in the property record
> > > that refers to that value. For example an integer property and a
> > > string property could both point to the same value record, but have
> > > different property types that indicate the default interpretation of
> > > the value.
> > i think that with small values we have to keep in mind that the
> > "key" (value identifier) may be bigger than the actual value and of
> > course the additional indirection also has a performance impact.
> > do you think that we should consider a minimum size for value's to
> > key stored in this manner? personally, i think that this might make
> > sense.
>
> For consistency I would use such value records for all values,
> regardless of the value size. I'd like to keep the value identifiers
> as short as possible, optimally just 64 bits, to avoid too much
> storage and bandwidth overhead. The indirection costs could probably
> best be avoided by storing copies of short value contents along with
> the value identifiers where the values are referenced.
>
> > anyway, what key did you have in mind?
> > i would assume some sort of a hash (md5) could be great or is this
> > still more abstract?
>
> I was thinking about something more concrete, like a direct disk
> offset. The value identifier could for example be a 64 bit integer
> with the first 32 bits identifying the revision that contains the
> value and the last 32 bits being the offset of the value record within
> a "value file". I haven't yet calculated whether such a scheme gives
> us a large enough identifier space.
>
i would use MD5 of the contents as keys...so your search for
dublicates is very cheep. and i would not use a value record for small
values. eg; the overhead of storing a 'boolean' is just too big.
considering you have 1mio nodes, with every node having a
'isCheckechedOut' property.... or a lastModified, whitch is never the
same.

regard, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Mime
View raw message