jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Klimetschek (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-7569) Direct Binary Access
Date Mon, 06 Aug 2018 06:47:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569802#comment-16569802
] 

Alexander Klimetschek edited comment on OAK-7569 at 8/6/18 6:46 AM:
--------------------------------------------------------------------

h4. Problem

Unfortunately, we could not commit the final agreed change this week, because [~mattvryan]
found a blocking issue: {{BinaryDownload.getURI()}} always returns null because the regular
code path {{node.getProperty().getBinary()}} does not pass the {{BlobAccessProvider}} instance
through to the {{ValueFactoryImpl}} because it uses a [static method of ValueFactoryImpl in
PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253].
h4. Details

The unit tests so far had a shortcut where it reused the {{Binary}} used for writing, hence
we didn't see the issue earlier. This has been fixed in the latest changes for the integration
tests: [https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of improvements
around the test framework itself).

This bug was introduced as a result of the refactorings during the review - which moved the
API to a ValueFactory extension. At the same time the unit tests changed a bit (accidentally).

It is possible to fix this particular Value access case, since {{PropertyImpl}} already has
access to the per-session {{ValueFactoryImpl}} instance, and it just needs to replace all
the static usages left by leveraging {{getValueFactory()}}. Same for some other places in
oak-jcr and oak-core that currently employ a mix of instance and static method access. [~mattvryan]
did an attempt at this here: [https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66]

Ideally, to ensure the {{blobAccessProvider}} reference in the {{ValueFactoryImpl}} is always
set to the reference from the whiteboard that the {{SessionContext}} has, one would have to
eliminate the static methods on {{ValueFactoryImpl}} and replace by instance methods. The
class isn't doing proper encapsulation at this point, which is visible by the fact that {{NamePathMapper}},
an implementation detail the ValueFactoryImpl relies on, is actually all over the Oak code
base.

*However, there are two static {{ValueFactoryImpl.createValue()}} methods retrieving a PropertyState
and PropertyValue argument respectively (+ NamePathMapper) that are used in different places
in oak, including oak-security-spi.* These are completely unrelated to the binary change.
They don't have any immediate access to the {{SessionContext}} (which provides both the {{ValueFactoryImpl}}
and {{BlobAcccessProvider}} instance). They all get a {{NamePathMapper}} passed in, which
is essentially the same situation that a component reference must be available for the {{ValueFactoryImpl}}
methods to do their work, but if you follow the call hierarchy it literally explodes. Meaning
a lot of files all over Oak would have to be changed to actually pass through the {{ValueFactoryImpl}}
instance instead of just {{NamePathMapper}}.

But even then, in many places it does not even get any access to the real {{NamePathMapper}}
which {{SessionContext}} implements, but a {{NamePathMapper.DEFAULT}} is used (which apparently
works for some of these cases?). It might be that some components were never designed to get
access to the {{SessionContext}}, at least it seems it would be a massive refactoring to ensure
that everywhere.
h4. Unsolved Binary cases

How does that affect the change? One example is {{AuthorizablePropertiesImpl}}. This corresponds
to the code path of {{Authorizable.getProperty("foo").getValue().getBinary()}}, which would
never be able to return anything from {{getURI()}}. Note that it will implement {{BinaryDownload}},
and the API contract is that {{getURI()}} shall return null if the feature is not available,
but not if the client used the "wrong" way of retrieving a {{Binary}}.

Other cases are hard to reason about: it's unclear where exactly they could end up being used
by a JCR client. We tried to analyze it but I am not sure one can make guarantees... and since
the static methods aren't property-type specific, one cannot exclude binary properties. Furthermore
we have to consider future usage of these static methods (unless they become deprecated or
removed).

FWIW, this is the reason in our original patch we did not extended the {{ValueFactory}} at
all but used an extension interface ({{BinaryAccessProvider}} on {{Session}}, as then access
to the {{SessionContext}} and anything else is trivial. The idea of the {{ValueFactory}} as
creator of values sounds good in theory, but the implementation in Oak using static methods
creates values "out of thin air" with no access to other components.

Also note that this only applies to downloads (BinaryDownload.getURI()). The upload ({{JackrabbitValueFactory.completeBinaryUpload()}})
is never affected, because the JCR client code will retrieve the {{JackrabbitValueFactory}}
instance from {{Session.getValueFactory()}}, which will always be the "good" instance.
h4. Options

[~mattvryan] and me identified these two options at this point. Given the huge effort of getting
rid of the static {{ValueFactoryImpl.createValue()}} methods completely (a patch touching
all of Oak), we excluded that one as a viable option - feedback welcome.
 # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where the static methods
are still used and "hope" they never play a role for client code wanting to access {{BinaryDownload}}.
 ** Pro: no API change, can work short term.
 ** Con: unknown edge cases are not covered.
 # *Change the API back to the {{BinaryAccessProvider}} design*, as an extension of {{Session}}.
 ** Pro: no implementation refactoring needed, changes limited to a few places
 ** Con: Oak team preferred the other API approach; the API was already released as jackrabbit-api
2.17.5. while there are no downstream clients yet, reverting it would be difficult (major
package version update or removing release artifact from maven central, ... see below)

There are two variants of 2:
 * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and download in one interface.
But that would mean removing the new interface {{JackrabbitValueFactory}} again (or it's methods).
Just deprecating them and have them linger around would be pretty awful for jackrabbit-api,
which should be "polished". That would result in a major version change of {{org.apache.jackrabbit.api}}
([2.5.0 to 3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]),
which would break all existing JCR client code (in an OSGi environment). Otherwise, given
there can be no real usage of this API as nothing released implements it yet, one could think
of reverting the 2.17.5 release, removing it from maven-central and just do a clean new start
from 2.17.4 with whatever API we want to add.

 * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only for download,
i.e. having a method {{getDownloadURI(Binary, BinaryDownloadOptions)}}. Then we would only
have to remove {{BinaryDownload}}, but that's not a problem because it's in the new package
{{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 to 2.0.0 export
version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is using that package yet).

h4. My take
 * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" approach, where
we don't know the edge cases ;)
 * Option 2a: a single interface for upload (write) and download (read) would arguably the
more intuitive approach, if we can't have both ValueFactory for write and Binary extension
for read. But I don't know if the API revert is possible.
 * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it doesn't matter
with good documentation.

Thoughts?


was (Author: alexander.klimetschek):
h4. Problem

Unfortunately, we could not commit the final agreed change this week, because [~mattvryan]
found a blocking issue: {{BinaryDownload.getURI()}} always returns null because the regular
code path {{node.getProperty().getBinary()}} does not pass the {{BlobAccessProvider}} instance
through to the {{ValueFactoryImpl}} because it uses a [static method of ValueFactoryImpl in
PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253].
h4. Details

The unit tests so far had a shortcut where it reused the {{Binary}} used for writing, hence
we didn't see it earlier. This has been fixed in the latest changes for the integration tests:
[https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of improvements
around the test framework itself).

This bug was introduced as a result of the refactorings during the review - which moved the
API to a ValueFactory extension. At the same time the unit tests changed a bit (accidentally).

It is possible to fix this particular Value access case, since {{PropertyImpl}} already has
access to the per-session {{ValueFactoryImpl}} instance, and it just needs to replace all
the static usages left by leveraging {{getValueFactory()}}. Same for some other places in
oak-jcr and oak-core that currently employ a mix of instance and static method access. [~mattvryan]
did an attempt at this here: [https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66]

Ideally, to ensure the {{blobAccessProvider}} reference in the {{ValueFactoryImpl}} is always
set to the reference from the whiteboard that the {{SessionContext}} has, one would have to
eliminate the static methods on {{ValueFactoryImpl}} and replace by instance methods. The
class isn't doing proper encapsulation at this point, which is visible by the fact that {{NamePathMapper}},
an implementation detail the ValueFactoryImpl relies on, is actually all over the Oak code
base.

*However, there are two static {{ValueFactoryImpl.createValue()}} methods retrieving a PropertyState
and PropertyValue argument respectively (+ NamePathMapper) that are used in different places
in oak, including oak-security-spi.* These are completely unrelated to the binary change.
They don't have any immediate access to the {{SessionContext}} (which provides both the {{ValueFactoryImpl}}
and {{BlobAcccessProvider}} instance). They all get a {{NamePathMapper}} passed in, which
is essentially the same situation that a component reference must be available for the {{ValueFactoryImpl}}
methods to do their work, but if you follow the call hierarchy it literally explodes. Meaning
a lot of files all over Oak would have to be changed to actually pass through the {{ValueFactoryImpl}}
instance instead of just {{NamePathMapper}}.

But even then, in many places it does not even get any access to the real {{NamePathMapper}}
which {{SessionContext}} implements, but a {{NamePathMapper.DEFAULT}} is used (which apparently
works for some of these cases?). It might be that some components were never designed to get
access to the {{SessionContext}}, at least it seems it would be a massive refactoring to ensure
that everywhere.
h4. Unsolved Binary cases

How does that affect the change? One example is {{AuthorizablePropertiesImpl}}. This corresponds
to the code path of {{Authorizable.getProperty("foo").getValue().getBinary()}}, which would
never be able to return anything from {{getURI()}}. Note that it will implement {{BinaryDownload}},
and the API contract is that {{getURI()}} shall return null if the feature is not available,
but not if the client used the "wrong" way of retrieving a {{Binary}}.

Other cases are hard to reason about: it's unclear where exactly they could end up being used
by a JCR client. We tried to analyze it but I am not sure one can make guarantees... and since
the static methods aren't property-type specific, one cannot exclude binary properties. Furthermore
we have to consider future usage of these static methods (unless they become deprecated or
removed).

FWIW, this is the reason in our original patch we did not extended the {{ValueFactory}} at
all but used an extension interface ({{BinaryAccessProvider}} on {{Session}}, as then access
to the {{SessionContext}} and anything else is trivial. The idea of the {{ValueFactory}} as
creator of values sounds good in theory, but the implementation in Oak using static methods
creates values "out of thin air" with no access to other components.

Also note that this only applies to downloads (BinaryDownload.getURI()). The upload ({{JackrabbitValueFactory.completeBinaryUpload()}})
is never affected, because the JCR client code will retrieve the {{JackrabbitValueFactory}}
instance from {{Session.getValueFactory()}}, which will always be the "good" instance.
h4. Options

[~mattvryan] and me identified these two options at this point. Given the huge effort of getting
rid of the static {{ValueFactoryImpl.createValue()}} methods completely (a patch touching
all of Oak), we excluded that one as a viable option - feedback welcome.
 # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where the static methods
are still used and "hope" they never play a role for client code wanting to access {{BinaryDownload}}.
 ** Pro: no API change, can work short term.
 ** Con: unknown edge cases are not covered.
 # *Change the API back to the {{BinaryAccessProvider}} design*, as an extension of {{Session}}.
 ** Pro: no implementation refactoring needed, changes limited to a few places
 ** Con: Oak team preferred the other API approach; the API was already released as jackrabbit-api
2.17.5. while there are no downstream clients yet, reverting it would be difficult (major
package version update or removing release artifact from maven central, ... see below)

There are two variants of 2:
 * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and download in one interface.
But that would mean removing the new interface {{JackrabbitValueFactory}} again (or it's methods).
Just deprecating them and have them linger around would be pretty awful for jackrabbit-api,
which should be "polished". That would result in a major version change of {{org.apache.jackrabbit.api}}
([2.5.0 to 3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]),
which would break all existing JCR client code (in an OSGi environment). Otherwise, given
there can be no real usage of this API as nothing released implements it yet, one could think
of reverting the 2.17.5 release, removing it from maven-central and just do a clean new start
from 2.17.4 with whatever API we want to add.

 * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only for download,
i.e. having a method {{getDownloadURI(Binary, BinaryDownloadOptions)}}. Then we would only
have to remove {{BinaryDownload}}, but that's not a problem because it's in the new package
{{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 to 2.0.0 export
version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is using that package yet).

h4. My take
 * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" approach, where
we don't know the edge cases ;)
 * Option 2a: a single interface for upload (write) and download (read) would arguably the
more intuitive approach, if we can't have both ValueFactory for write and Binary extension
for read. But I don't know if the API revert is possible.
 * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it doesn't matter
with good documentation.

Thoughts?

> Direct Binary Access
> --------------------
>
>                 Key: OAK-7569
>                 URL: https://issues.apache.org/jira/browse/OAK-7569
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: api, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>         Attachments: OAK-7569-api-javadoc-improvements.patch
>
>
> Provide a direct binary access feature to Oak which allows an authenticated client to
create or download blobs directly to/from the blob store, assuming the authenticated user
has appropriate permission to do so. The primary value of this feature is that the I/O of
transferring large binary files to or from the blob store can be offloaded entirely from Oak
and performed directly between a client application and the blob store.
> This feature is described in more detail [on the Oak wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].
> This feature is similar in functionality to OAK-6575.  It adds the capability to also
upload directly to storage via preauthorized URLs in addition to downloading directly from
storage via preauthorized URLs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message