jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandru Popescu ☀" <the.mindstorm.mailingl...@gmail.com>
Subject Re: NGP: Value records
Date Wed, 25 Apr 2007 13:00:34 GMT
On 4/25/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> On 4/25/07, Alexandru Popescu ☀ <the.mindstorm.mailinglist@gmail.com> wrote:
> > On 4/23/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > > My idea is to store each value in a unique and immutable "value
> > > record" identified by a "value identifier". Duplicate values are only
> > > stored once in a single value record. This saves space especially when
> > > storing multiple copies of large binary documents and allows value
> > > equality comparisons based on just the identifiers.
> > > [...]
> >
> > I may be misreading something, but my main concern with this approach
> > is that while minimizing the size of the storage (which is very cheap
> > right now and almost infinite) it has a penalty on the access
> > performance: needing 2 "I/O" operations for reading a value. The
> > caching strategy may address this problem, but even if memory is also
> > cheap it is still limitted. So, while I see this solution fit for
> > cases where huge amounts of duplicate data would be stored, for all
> > the other cases I see it as suboptimal.
> Good point. Apart from the space savings my main goal was to have
> short constant-length identifiers that could be used for equality
> comparisons instead of comparing the value contents. This would be
> especially beneficial for things like names and paths and probably
> also other medium-length strings, but I agree that the
> locality-of-access issue should be resolved somehow.

Do you mean something like RDBMS IDs?

Another possible problem with the shared values approach is that in a
concurrent environment accessing these may become a bottleneck as you
will almost always need to serialize the access. Considering that
reading is now a 2 step op then you will almost always need to
synchronize on that access, and so this will lead to serialized access
which   not fit any concurrent environment.

.w( the_mindstorm )p.
  Alexandru Popescu, OSS Evangelist
  Information Queue ~ www.InfoQ.com

> BR,
> Jukka Zitting
View raw message