jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (JCR-3735) Efficient copying of binaries in Jackrabbit DataStores
Date Mon, 24 Feb 2014 13:39:19 GMT

    [ https://issues.apache.org/jira/browse/JCR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910300#comment-13910300
] 

Thomas Mueller commented on JCR-3735:
-------------------------------------

Hm, you mean we create a new variant of FileDataStore that doesn't do de-duplication? OK,
that's an idea.

And of course the user might decide to pass a wrapped input stream (BufferedInputStream or
similar). I don't know of a good solution for this.

One item to not forget is that the input stream might not be positioned at the very beginning.
But this can be supported. I wrote some proof of concept code, maybe it is helpful:

{noformat}
public static void main(String... args) throws Exception {
    String fileName = System.getProperty("user.home") + "/temp/test.txt";
    FileOutputStream out = new FileOutputStream(fileName);
    out.write("Hello World".getBytes("UTF-8"));
    InputStream in = new FileInputStream(fileName);
    // skip the first byte
    in.read();
    process(in);
}

static void process(InputStream in) throws Exception {
    if (!(in instanceof FileInputStream)) {
        // use default
    }
    FileInputStream fin = (FileInputStream) in;
    FileChannel c = fin.getChannel();
    long start = c.position();
    System.out.println("start: " + start);
    long length = c.size() - start;
    MessageDigest digest = MessageDigest.getInstance("SHA-1");
    ByteBuffer buff = ByteBuffer.allocate(64 * 1024);
    long pos = start;
    while (true) {
        long len = c.read(buff, pos);
        if (len < 0) {
            break;
        }
        pos += len;
        digest.update(buff.array(), 0, buff.remaining());
        buff.clear();
    }           
    byte[] sha1 = digest.digest(new byte[0]);
    String outFileName = System.getProperty("user.home") + 
            "/temp/" + new BigInteger(sha1).toString(16) + ".txt";
    FileChannel out = new RandomAccessFile(outFileName, "rw").getChannel();
    while (length > 0) {
        long len = c.transferTo(start, length, out);
        length -= len;
    }
    out.close();
    c.close();
}   
{noformat}

> Efficient copying of binaries in Jackrabbit DataStores
> ------------------------------------------------------
>
>                 Key: JCR-3735
>                 URL: https://issues.apache.org/jira/browse/JCR-3735
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.7.4
>            Reporter: Amit Jain
>
> In the DataStore implementations an additional temporary file is created for every binary
uploaded. This step is an additional overhead when the upload process itself creates a temporary
file. 
> So, the solution proposed is to check if the input stream passed is a FileInputStream
and then use the FileChannel object associated with the input stream to copy the file.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message