jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (JCR-3735) Efficient copying of binaries in Jackrabbit DataStores
Date Mon, 24 Feb 2014 13:39:19 GMT

    [ https://issues.apache.org/jira/browse/JCR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910300#comment-13910300

Thomas Mueller commented on JCR-3735:

Hm, you mean we create a new variant of FileDataStore that doesn't do de-duplication? OK,
that's an idea.

And of course the user might decide to pass a wrapped input stream (BufferedInputStream or
similar). I don't know of a good solution for this.

One item to not forget is that the input stream might not be positioned at the very beginning.
But this can be supported. I wrote some proof of concept code, maybe it is helpful:

public static void main(String... args) throws Exception {
    String fileName = System.getProperty("user.home") + "/temp/test.txt";
    FileOutputStream out = new FileOutputStream(fileName);
    out.write("Hello World".getBytes("UTF-8"));
    InputStream in = new FileInputStream(fileName);
    // skip the first byte

static void process(InputStream in) throws Exception {
    if (!(in instanceof FileInputStream)) {
        // use default
    FileInputStream fin = (FileInputStream) in;
    FileChannel c = fin.getChannel();
    long start = c.position();
    System.out.println("start: " + start);
    long length = c.size() - start;
    MessageDigest digest = MessageDigest.getInstance("SHA-1");
    ByteBuffer buff = ByteBuffer.allocate(64 * 1024);
    long pos = start;
    while (true) {
        long len = c.read(buff, pos);
        if (len < 0) {
        pos += len;
        digest.update(buff.array(), 0, buff.remaining());
    byte[] sha1 = digest.digest(new byte[0]);
    String outFileName = System.getProperty("user.home") + 
            "/temp/" + new BigInteger(sha1).toString(16) + ".txt";
    FileChannel out = new RandomAccessFile(outFileName, "rw").getChannel();
    while (length > 0) {
        long len = c.transferTo(start, length, out);
        length -= len;

> Efficient copying of binaries in Jackrabbit DataStores
> ------------------------------------------------------
>                 Key: JCR-3735
>                 URL: https://issues.apache.org/jira/browse/JCR-3735
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.7.4
>            Reporter: Amit Jain
> In the DataStore implementations an additional temporary file is created for every binary
uploaded. This step is an additional overhead when the upload process itself creates a temporary
> So, the solution proposed is to check if the input stream passed is a FileInputStream
and then use the FileChannel object associated with the input stream to copy the file.

This message was sent by Atlassian JIRA

View raw message