jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "JCR Binary Usecase" by ChetanMehrotra
Date Fri, 27 May 2016 06:08:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "JCR Binary Usecase" page has been changed by ChetanMehrotra:
https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase?action=diff&rev1=2&rev2=3

Comment:
Add anchors

  
  Without this change currently we need to first spool the file content into some temporary
location and then pass that to the other program. This add unnecessary overhead and something
which can be avoided in case there is a FileDataStore being used where we can provide a direct
access to the file
  
+ <<Anchor(UC2)>>
- === Efficient replication across regions in S3 ===
+ === UC2 - Efficient replication across regions in S3 ===
  
  ''For binary less replication in non shared DataStore across multiple regions need access
to S3Object ID backing the blob such that it can be efficient copied to a bucket in different
region via S3 Copy Command''
  
@@ -36, +37 @@

  
  Instead of that plan is to replicate the specific assets via s3 copy operation. This would
ensure that big assets can be copied efficiently at S3 level
  
+ <<Anchor(UC3)>>
  === UC3 - Text Extraction without temporary File with Tika ===
  
  ''Avoid creation of temporary file where possible''
@@ -44, +46 @@

  
  Going forward if we need to make use of [[https://issues.apache.org/jira/browse/TIKA-416|Out
of Process Text Extraction]] then this aspect would be useful there also
  
+ <<Anchor(UC4)>>
  === UC4 - Spooling the binary content to socket output via NIO ===
  
  ''Enable use of NIO based zero copy file transfers''
@@ -54, +57 @@

  
  Key aspect here is that where possible we should be able to avoid IO. Also have a look at
[[https://kafka.apache.org/08/design.html#maximizingefficiency|Kafka design]] which tries
to make use of OS cache as much as possible and avoid Io via jvm if possible thus providing
much better throughputs
  
+ <<Anchor(UC5)>>
  === UC5 - Transferring the file to FileDataStore with minimal overhead ===
  
  '' Need a way to construct JCR Binary via a File reference where File instance  "ownership
is transferred" say via rename without spooling its content again''
@@ -62, +66 @@

  
  In some deployments a customer would typically upload lots of files in a FTP folder and
then from there the files are transferred to Oak. As mentioned in 2b above with NAS based
storage  this would result in file being copied twice. So to avoid the extra overhead it would
be helpful if one can create a File directly in the NFS as per FileDataStore structure (content
hash -> split 3 level) and then add the Binary via ReferenceBinary approach
  
+ <<Anchor(UC6)>>
  === UC6 - S3 import ===
  
  This somewhat similar to previous case but more around S3 support
@@ -70, +75 @@

  
  The problem though: how to efficiently get them into the S3DS, ideally without moving them
  
+ <<Anchor(UC7)>>
  === UC7 - Editing large files ===
  
  Think: a video file exposed onto the desktop via WebDAV. Desktop tools would do random writes
in that file. How can we cover this use case without up/downloadin the large file. (essentially:
random write access in binaries)

Mime
View raw message