jackrabbit-oak-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From thom...@apache.org
Subject svn commit: r1525528 - in /jackrabbit/oak/trunk/oak-doc/src/site: markdown/blobstore.md site.xml
Date Mon, 23 Sep 2013 08:15:47 GMT
Author: thomasm
Date: Mon Sep 23 08:15:47 2013
New Revision: 1525528

URL: http://svn.apache.org/r1525528
OAK-301 Document Oak - BlobStore


Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/blobstore.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/blobstore.md?rev=1525528&view=auto
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/blobstore.md (added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/blobstore.md Mon Sep 23 08:15:47 2013
@@ -0,0 +1,61 @@
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+  -->
+## The Blob Store
+The Oak BlobStore is similar to the Jackrabbit 2.x DataStore. However, there are a 
+few minor problems the BlobStore tries to address.
+Because, for the Jackrabbit DataStore:
+* a temporary file is created when adding a large binary, 
+  even if the binary already exists
+* sharding is slow and complicated because the hash needs to be calculated
+  first, before the binary is stored in the target shard (the FileDataStore
+  still doesn't support sharding the directory currently)
+* file handles are kept open until the consumer is done reading, which
+  complicates the code, and we could potentially get "too many open files"
+  when the consumer doesn't close the stream
+* for database based data stores, there is a similar (even worse) problem
+  that streams are kept open, which means we need to use connection pooling,
+  and if the user doesn't close the stream we could run out of connections
+* for database based data stores, for some databases (MySQL), binaries are
+  fully read in memory, which results in out-of-memory
+* binaries that are similar are always stored separately no matter what
+Those problems are solved in Oak BlobStores, because binaries are split
+into blocks of 2 MB. This is similar to how DropBox works internally:
+http://serverfault.com/questions/52861/how-does-dropbox-version-upload-large-files - 
+blocks are processed in memory so that temp files are never
+needed, and blocks are cached. File handles don't need to be kept open.
+Sharding is trivial because each block is processed separately.
+Binaries that are similar: in the BlobStore, currently, they are stored
+separately except if some of the 2 MB blocks match. However, the algorithm
+in the BlobStore would allow to re-use all matching parts, because in the
+BlobStore, concatenating blob ids means concatenating the data.
+Another change was that most DataStore implementations use SHA-1, while
+the BlobStore uses SHA-256. Using SHA-256 will be a requirement at some
+point, see also http://en.wikipedia.org/wiki/SHA-2 "Federal agencies ... 
+must use the SHA-2 family of hash functions for these applications
+after 2010". This might affect some potential users.

Modified: jackrabbit/oak/trunk/oak-doc/src/site/site.xml
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/site.xml?rev=1525528&r1=1525527&r2=1525528&view=diff
--- jackrabbit/oak/trunk/oak-doc/src/site/site.xml (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/site.xml Mon Sep 23 08:15:47 2013
@@ -34,7 +34,8 @@ under the License.
       <item href="overview.html" name="Overview" />
       <item href="nodestate.html" name="Understanding the node state model" />
       <item href="microkernel.html" name="Microkernel" />
-      <item href="query.html" name="The query engine" />
+      <item href="query.html" name="Query" />
+      <item href="blobstore.html" name="BlobStore" />
     <menu name="Using Oak">
       <item href="use_getting_started.html" name="Getting Started" />

View raw message