jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-926) Global data store for binaries
Date Thu, 21 Jun 2007 11:28:26 GMT

    [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506853
] 

Jukka Zitting commented on JCR-926:
-----------------------------------

The above comment is probably not comparable with previous numbers, the setProperty() time
should not change considerably change with the DataStore patch (in fact it should take a bit
longer due to the SHA-1 calculation). To avoid things like disk caches to interfere with the
test I increased the size of the test file to 3GB (I only have 1GB RAM).

With DataStore2.patch and FineGrainedISMLocking the result is:

    Thu Jun 21 13:51:09 EEST 2007 - setProperty() - 1
    Thu Jun 21 13:55:17 EEST 2007 - begin save() - 2338
    Thu Jun 21 13:55:18 EEST 2007 - end save() - 2352
    numReads: 2353

setProperty() = 248 seconds, save() = 1 second

Without DataStore2.patch but with FineGrainedISMLocking the result is:

    Thu Jun 21 14:08:33 EEST 2007 - setProperty() - 0
    Thu Jun 21 14:12:58 EEST 2007 - begin save() - 2419
    Thu Jun 21 14:17:03 EEST 2007 - end save() - 4766
    numReads: 4816

setProperty() = 265 seconds, save() = 245 seconds

I guess the stream copy algorithm in FileDataStore is slightly faster than the one in BLOBFileValue,
otherwise the numbers are pretty much as expected.

> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: DataStore.patch, DataStore2.patch, ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large binary
values:
> 1) Persisting a large binary value blocks access to the persistence layer for extended
amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API:
one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values
can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a
global "data store" concept in the repository. A data store is an append-only set of binary
values that uses short identifiers to identify and access the stored binary values. The data
store would trivially fit the requirements of transient space and transaction handling due
to the append-only nature. An explicit mark-and-sweep garbage collection process could be
added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this
idea.
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message