jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: Evaluating jackrabbit use for media content
Date Tue, 22 Oct 2013 15:31:36 GMT
On Tue, Oct 15, 2013 at 12:10 PM, Julian Reschke <julian.reschke@gmx.de>wrote:

> On 2013-10-15 11:33, Santiago Gala wrote:
>
>> On Mon, Oct 14, 2013 at 9:33 AM, Marcel Reutegger <mreutegg@adobe.com>
>> wrote:
>>
>>> Hi,
>>>
>>>  So I guess there are at least three components involved:
>>>>
>>>> * The backend JCR repository
>>>> * The webdav access to it via jetty
>>>> * The client library in my program side.
>>>>
>>>> I was thinking that webdav would enable some kind of seekable access,
>>>> but I think one of those components is breaking the chain. I don't
>>>> understand how is one supposed to get seekable access when the
>>>> repository
>>>> is accessed via network.
>>>>
>>>
>>> you are using the built-in WebDAV support of Jackrabbit. I don't know
>>>
>>
>> I have also tried the RMI connection, and in this case the problem is that
>> If I use RMI via, http://jackrabbit-server/rmi URIs, Jackrabbit copies
>> binaries to java.io.tmpdir every time they are used in a session, and
>> doesn't delete the copy when the session is disposed.
>> org.apache.jackrabbit.rmi.**value.SerializableBinaryValue is the place
>> where
>> the temporary is created, and never deleted.
>>
>> The snippet is:
>>
>>  if (n.hasProperty("jcr:data")) {
>>>      Value v = n.getProperty("jcr:data").**getValue();
>>>      if (v.getType() == PropertyType.BINARY) {
>>>          Binary b = v.getBinary();
>>>          System.out.println("Binary class: " + b.getClass().toString());
>>>          byte[] buff = new byte[100];
>>>          // InputStream ios = b.getStream();
>>>          // ios.skip(b.getSize() - buff.length);
>>>          // System.out
>>>          // .println("Stream class: " + ios.getClass().toString());
>>>          // int len = ios.read(buff);
>>>          // ios.close();
>>>          int len = b.read(buff, b.getSize() - buff.length);
>>>          System.out.println("Binary(" + len + "): "
>>>              + new String(buff, "UTF-8"));
>>>          b.dispose();
>>>          }
>>>      v = null;
>>>      }
>>>
>>
>> The alternative getStream() implementation shown in comments leaves the
>> same temporary files in java.io.tmpdir.
>>
>> This is a no-no for me, as having one non-deleted copy of a 1G file per
>> session will fill any temporary space I could design when we use it for
>> media, and we have no simple way to know when the file is no longer in
>> use.
>> the dispose implementation is supposed to delete the file, but this is not
>> happening in my tests, not sure why. It is supposed to be called by
>> myself,
>> as you can see, and also automatically on finalize... I guess something
>> related with the transient nature of the file or the stubs is avoiding the
>> finalization to be called for RMI objects... This is probably a bug and
>> discards RMI for us.
>>
>> Using a /server URL does not leave the temporary files on exit.
>>
>>
>>>  the implementation that well, but it may well be that it doesn't support
>>>> the seekable access you need.
>>>>
>>>>
>> It does answer to any request that includes the header "Range:
>> bytes=0-200"
>> with a 200 status and the whole file (700 Megs in my test). Further,
>> nothing in what I have peeked into the implementations of jackrabbit
>> stable, unstable or oak hints to support of byte ranges at the DAV server
>> side.
>> ...
>>
>
> Well, if this (Range request support) is indeed missing, we really really
> should add it.
>

Adding support for single range requests for GET (i.e., requests without a
comma in the Range: Header) seems easy enough. This is what openstack
storage/swift object storage supports (one of the alternatives to
JCR/jackrabbit or modeshape we are considering).

In an additional experiment, I noticed that RFC 2616 says: "If a proxy that
supports ranges receives a Range request, forwards the request to an
inbound server, and receives an entire entity in reply, it SHOULD only
return the requested range to its client. It SHOULD store the entire
received response in its cache if that is consistent with its cache
allocation policies."

I tested Apache 2.2 mod_dav as a reverse proxy to jackrabbit and found that
Apache indeed introduces range support for requests that have a
Content-Lenght (say, http://localhost/server/default  or
http://localhost/repository/default/README.txt where README.txt is a text
file added to the root of the default workspace) but not for the rest of
the resources, including things like
http://localhost/server/default/jcr:root/README.txt/jcr:content/jcr:data ,
for which it returns a 200 and the whole resource.

An example of a 206:
$ curl -v -u admin:admin -H "Range: bytes=0-20"
http://localhost/repository/default/README.txt
* About to connect() to localhost port 80 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
* Server auth using Basic with user 'admin'
> GET /repository/default/README.txt HTTP/1.1
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.29.0
> Host: localhost
> Accept: */*
> Range: bytes=0-20
>
< HTTP/1.1 206 Partial Content
< Date: Tue, 22 Oct 2013 14:46:59 GMT
< Server: Jetty(6.1.x)
< ETag: "2651-1381326102000"
< Content-Length: 21
< Last-Modified: Wed, 09 Oct 2013 13:41:42 GMT
< Content-Type: text/plain
< Vary: Accept-Encoding
< Content-Range: bytes 0-20/2651
<
* Connection #0 to host localhost left intact
=====================

An example of a 200 request:

$ curl "Range: bytes=0-20"
http://localhost/server/default/jcr:root/README.txt
* About to connect() to localhost port 80 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
* Server auth using Basic with user 'admin'
> GET /server/default/jcr:root/README.txt HTTP/1.1
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.29.0
> Host: localhost
> Accept: */*
> Range: bytes=0-20
>
< HTTP/1.1 200 OK
< Date: Tue, 22 Oct 2013 14:48:19 GMT
< Server: Jetty(6.1.x)
< Last-Modified: Tue, 22 Oct 2013 14:48:19 GMT
< Content-Type: text/xml
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
<?xml version="1.0" encoding="UTF-8"?><sv:node sv:name="README.txt"
xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:fn_old="
http://www.w3.org/2004/10/xpath-functions" xmlns:xs="
http://www.w3.org/2001/XMLSchema" xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0" xmlns:sv="
http://www.jcp.org/jcr/sv/1.0" xmlns:rep="internal" xmlns:nt="
http://www.jcp.org/jcr/nt/1.0"><sv:property sv:name="jcr:primaryType"
sv:type="Name"><sv:value>nt:file</sv:value></sv:property><sv:property
sv:name="jcr:created"
sv:type="Date"><sv:value>2013-10-22T15:35:13.163+02:00</sv:value></sv:property><sv:property
sv:name="jcr:createdBy"
sv:type="String"><sv:value>admin</sv:value></sv:property></sv:node>


Unfortunately Apache mod_disk_cache will NEVER store authenticated
resources, or so the documentation says...

Regards
Santiago



> Best regards, Julian
>
> Best regards,
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message