jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: NGP: Value records
Date Wed, 06 Jun 2007 09:20:49 GMT
On 6/6/07, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> hi jukka,
>
> On 6/5/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > Hi,
> >
> > On 5/16/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > > On 5/12/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > > > Based on the feedback I agree that it probably doesn't make sense to
> > > > keep track of unique copies of all values. However, avoiding extra
> > > > copies of large binaries is still a very nice feature, so I'd still
> > > > like to keep the single copy idea for those values. This is in fact
> > > > something that we might want to consider already for Jackrabbit 1.4
> > > > regardless of what we'll do with the NGP proposal.
> > >
> > > See JCR-926 for a practical application of this idea to current Jackrabbit.
> >
> > I just did a quick prototype where I made the InternalValue class turn
> > all incoming binary streams into data records using a global data
> > store. Internally the value would just be represented by the data
> > identifier.
> >
> > This allowed me to simplify quite a few things (for example to drop
> > all BLOBStore classes and custom handling of binary properties) and to
> > achieve *major* performance improvements for cases where large (>
> > 100kB) binaries are handled. For example the time to save a large file
> > was essentially cut in half and things like versioning or cloning
> > trees with large binaries would easily become faster by an order of
> > magnitude. With this change it is possible for example to copy a DVD
> > image file in milliseconds. What's even better, not only did this
> > change remove extra copying of binary values, it also pushed all
> > binaries out of the persistence or item state managers so that no
> > binary read or write operation would ever lock the repository!
>
> awesome, that's great news!
>
> is there a way to purge the binary store, i.e. remove unreferenced data?
> i am a bit concerned that doing a lot of add/remove operations would
> quickly exhaust available storage space. at least we need a concept
> how deal with this kind of situation.

something that just crossed my mind: i know a number of people
want to store everything (config, meta data, binaries and content)
in the same db in order to allow easy backup/restore of an entire
repository. currently they can do so by using DatabaseFileSystem
and the externalBLOBs=false option of DatabasePersistenceManager.

do you plan to support db persistence for the binary store as well?

cheers
stefan

>
> >
> > The downside of the change is that it requires backwards-incompatible
> > changes in jackrabbit-core, most notably pulling all blob handling out
> > of the existing persistence managers. Adopting the data store concept
> > would thus require migration of all existing repositories. Luckily
> > such migration would likely be relatively straightforward and we could
> > write tools to simplify the upgrade, but it would still be a major
> > undertaking.
> >
> > I would very much like to go forward with this approach, but I'm not
> > sure when would be the right time to do that. Should we target already
> > the 1.4 release in September/October, or would it be better to wait
> > for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
> > a 2.0 release already this year with this and some other structural
> > changes, and have Jackrabbit 3.0 be the JSR 283 reference
> > impelementation?
>
> since the jsr-283 public review is just around the corner we'll have to
> start work on the ri pretty soon. therefore i think the ri should target
> v2.0.
>
> wrt intergating JCR-926 both 1.4 and 2.0 would be fine with me.
>
> cheers
> stefan
>
> >
> > BR,
> >
> > Jukka Zitting
> >
>

Mime
View raw message