jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-926) Global data store for binaries
Date Tue, 19 Jun 2007 12:41:26 GMT

    [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506141
] 

Thomas Mueller commented on JCR-926:
------------------------------------

Hi,

I wrote a small benchmark application to better understand the problem
'Upload of a large file will block other concurrent actions' as described here:

http://issues.apache.org/jira/browse/JCR-314
http://www.mail-archive.com/users@jackrabbit.apache.org/msg02503.html

However, in the current Jackrabbit, it looks like this problem is solved.
Here is what I tried:
- For 10 seconds, a new thread is added each second
- 2 threads write large files, the others just write simple nodes
- I made tests with small (8 KB) up to large (16 MB) files

To compare the results, my application has a mode
where the file is not sent to Jackrabbit, instead it is written to disk 
(RandomAccessFile). I wanted to find out how long the 'simple' 
threads are blocked by one thread writing a large files. The results are:

- Storing the file outside the repository is about 30% faster
  (but the reason might be the write buffer size or so).
- When a thread writes a large object, the other threads are _not_ blocked badly.
  At least not more than if the file is stored on the same disk.
  
If you want to 'not block others' when writing large objects, the only solution
I found is to store large objects on another hard drive. I have tested this as 
well, and it completely solves the problem.

Thomas


> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: DataStore.patch, DataStore2.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large binary
values:
> 1) Persisting a large binary value blocks access to the persistence layer for extended
amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API:
one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values
can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a
global "data store" concept in the repository. A data store is an append-only set of binary
values that uses short identifiers to identify and access the stored binary values. The data
store would trivially fit the requirements of transient space and transaction handling due
to the append-only nature. An explicit mark-and-sweep garbage collection process could be
added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this
idea.
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message