jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laird, Nicholas J." <Nicholas.La...@gd-ais.com>
Subject RE: Problems storing/accessing very large files via WebDAV
Date Tue, 19 May 2009 16:33:05 GMT
Stefan, thanks for the response.

So I'm curious if there is a way to optimize this process for large files, or if anyone is
really using Jackrabbit in this type of configuration (accessing very large files via a mapped
network drive).  Given the Windows Explorer WebDAV implementation in XP, it seems like native
Windows isn't viable, and while a product like WebDrive helps in that regard, the setup still
feels tentative.

My end goal is a content workspace supporting approximately 50 - 100 concurrent users who
can access the repository through a custom application (via JCR) or via Windows Explorer (i.e.,
mapped drive), allowing them to read and write files of size up to around 4GB.  Ideally, they
would be able to FTP to the repository as well (the easiest solution would seem to be via
FTP to the mapped drive, but at that point we're layering FTP on top of WebDAV on top of JCR
and I'm wondering if we're pushing our luck).  Once I started using the DataStore, files above
a few hundred MBs became manageable (whereas before they were not), but I can't seem to get
to that upper file limit, perhaps because it's not feasible.  Is anyone else using JCR/Jackrabbit
for this type of setup with success?

Thanks.

Nicholas Laird

-----Original Message-----
From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com] 
Sent: Wednesday, May 06, 2009 4:38 AM
To: users@jackrabbit.apache.org
Subject: Re: Problems storing/accessing very large files via WebDAV

On Wed, May 6, 2009 at 12:00 PM, Stefan Guggisberg
<stefan.guggisberg@gmail.com> wrote:
> hi nicolas,
>
> On Thu, Apr 30, 2009 at 1:22 AM, Laird, Nicholas J.
> <Nicholas.Laird@gd-ais.com> wrote:
>> I am having an issue when storing and retrieving very large files (
>> 400MB -> 2GB ) to my Jackrabbit repository via WebDAV.  I am using the
>> FileDataStore to store resources of size greater than 100 bytes (i.e.,
>> the default configuration for FileDataStore on the Jackrabbit wiki).
>>
>> I need to support these large files and have the repository presented to
>> the end user as a mapped network drive in Windows Explorer.  I have
>> tried using both Windows Explorer's built-in WebDAV client (by mapping
>> my repository as a network drive) and a product called WebDrive, which
>> also does network drive mapping.  Performance with WebDrive is better
>> (Explorer seems to have known WebDAV issues, at least in Windows XP),
>> but for large enough files even it gets bogged down and confused.
>>
>> When uploading a file by drag/drop to the mapped network drive in
>> Explorer, the upload seems to proceed and finish normally, from the
>> WebDAV client's perspective, however Jackrabbit is performing some sort
>> of internal caching (it seems) while the WebDAV session is still "open",
>> with the client believing that the transfer is still in progress.
>>
>> A snippet of the Jackrabbit log file (with trace logging enabled) during
>> the transfer is below.  Notice the timestamp difference between lines 3
>> and 4:
>>
>> 1) 29.04.2009 16:04:56 *DEBUG* ImportContextImpl: Starting IOHandler
>> (org.apache.jackrabbit.server.io.DefaultHandler)
>> (DefaultIOListener.java, line 43)
>> 2) 29.04.2009 16:04:56 *DEBUG* ItemManager: caching item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e (ItemManager.java, line 787)
>> 3) 29.04.2009 16:04:56 *DEBUG* ItemManager: caching item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}data
>> (ItemManager.java, line 787)
>> 4) 29.04.2009 16:05:12 *DEBUG* ItemManager: caching item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}mimeTyp
>> e (ItemManager.java, line 787)
>> 5) 29.04.2009 16:05:12 *DEBUG* ItemManager: caching item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}encodin
>> g (ItemManager.java, line 787)
>> 6) 29.04.2009 16:05:12 *DEBUG* ItemManager: destroyed item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}encodin
>> g (ItemManager.java, line 884)
>> 7) 29.04.2009 16:05:12 *DEBUG* ItemManager: removing items
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}encodin
>> g from cache (ItemManager.java, line 801)
>> 8) 29.04.2009 16:05:12 *DEBUG* ItemManager: caching item
>> e7ab8f92-d6a5-4bbf-bb9c-fb7e0ab9042e/{http://www.jcp.org/jcr/1.0}lastMod
>> ified (ItemManager.java, line 787)
>> 9) 29.04.2009 16:05:12 *DEBUG* ImportContextImpl: Result for IOHandler
>> (org.apache.jackrabbit.server.io.DefaultHandler): OK
>> (DefaultIOListener.java, line 50)
>>
>> 16 seconds isn't an eternity, but the time increases as the size of the
>> file increases.  At a large enough file size, Explorer gives up on the
>> transfer with a "Write Delay Failed" error and WebDrive thinks the
>> server has taken too long to respond and times out.  WebDrive can be
>> configured to wait longer, but I have had to increase the time to 2
>> minutes to try to manage files nearing 2GB.
>>
>> During downloads, the same situation occurs, except "caching item" of
>> the jcr:data property occurs before the download can start.
>>
>> I am not sure exactly what Jackrabbit is doing or if there is a way to
>> speed up the process (or prevent the caching altogether, if that is
>> indeed what is happening).  It could be that some other operation is
>> occurring that is not being revealed by the logging (though the trace
>> logging seems pretty thorough and verbose).
>
> the trace msgs sent you on a wrong track. the "temManager: caching ..."
> msgs are irrelevant here. ItemManager does cache implementations
> of the javax.jcr.Item interface. the real 'data' is managed separatly.
>
> AFAIK the time spent on storing a large binary mainly accounts for
> building the hash (used by the datastore) and spoolign the data to
> the datastore. depenending on the type of binary and your configuration
> text extractors might also be involved.

i ran a quick test on my machine (os-x 10.5, 2.8ghz core duo, 7200 hdd,
256mb jvm heap, FileDataStore):

- storing a 700mb video file in a local rep: ~25s
- spooling a local 700mb file using a 4k buffer: ~16s

the diff probably accounts for  computing the hash of the file content.

cheers
stefan

>
> maybe thomas can provide some more input...
>
> cheers
> stefan
>
>>
>> Any suggestions on how to configure or optimize for this situation is
>> greatly appreciated.
>>
>> Sincerely,
>> Nicholas Laird
>>
>>
>

Mime
View raw message