jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller" <thomas.tom.muel...@gmail.com>
Subject Re: NGP: Value records
Date Fri, 15 Jun 2007 08:25:54 GMT
Hi,

The main problem seems to be:
- Avoid blocking the engine when streaming large objects

I could write a test with multiple sessions that concurrently read and
write large and small objects. Like this we have something to test
against. Or does such a test already exist?

Afterwards, I would be interested in helping to implementing the
Global Data Store.

Thomas


On 6/13/07, Thomas Mueller <thomas.tom.mueller@gmail.com> wrote:
> Hi,
>
> I just read about the Global Data Store proposal
> (https://issues.apache.org/jira/browse/JCR-926) and I think it's a
> great idea:
>
> - Avoid blocking the engine when streaming large objects
> - Avoid copying twice (first transient, then persistent store)
> - Versioning: avoid multiple copies of the same object
>
> However I am not sure if mark-and-sweep garbage collection (GC) is the
> best solution:
>
> - Very slow for large repositories
> - Need to stop everything to
> - Frees up space very late
>
> To avoid problems with mark-and-sweep GC, I would use reference
> counting and file renaming. See also
> http://en.wikipedia.org/wiki/Reference_counting. Algorithm:
>
> - While the value is still transient, the file name ends with '.0'
> - When persisted, rename the file ('.1')
> - When adding a reference to an existing object (link), rename to
> '.2', '.3', and so on
> - When a reference is deleted (unlink), decrement the counter; delete
> the file if '.0'
> - At repository startup, delete '.0' files (transient objects after
> the repository was killed)
>
> There are some issues to solve: should the file be renamed when the
> value is added/deleted, or when the add/delete is committed?
>
> Files should be read-only; in theory they could be changed while the
> reference count is below 2.
>
> I would store small items (up to 1 KB or so) as like regular values,
> to avoid lots of tiny files.
>
> I wrote 'files' above; I know this could be something else (database,
> Amazon S3). Reference counts could be kept somewhere else if you don't
> like renaming files (I like it), or if renaming is not possible.
>
> In the future, why not store large Strings in the global data store as
> well. Some applications store large XML documents as Strings. However
> this is not urgent: large Strings can be stored as binary values.
>
> Thomas
>

Mime
View raw message