incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Story <henry.st...@bblfish.net>
Subject Re: [VOTE] Accept the proposed patch of CLEREZZA-540
Date Sun, 29 May 2011 17:06:26 GMT

On 26 May 2011, at 20:31, Reto Bachmann-Gmuer wrote:

> With CLEREZZA-540 I suggest a GraphNodeProvider-Service that returns a
> GraphNode given a named resource. Mainly this code that used to be in
> DiscoBitsTypeHandler which has been generalized.
> 
> The issue is described as:
> "Implement a platform service that returns GraphNodes for URIs. The
> GraphNode is the resource identified by that uri with as BaseGraph sources
> considered authoritative for that resource. "
> 
> Of course "considered authoritative" it not a very sharp description. The
> issue is labeled with "platform" which implies it is not a generic utility
> of clerezza.rdf but that it relies on platform default graphs.
> 
> The solution proposed in commit
> #1125477<http://svn.apache.org/viewvc?view=rev&rev=1125477>and
> #1125652 <http://svn.apache.org/viewvc?view=rev&rev=1125652> sets the
> basegarph as follows:
> - always trust the content graph
> - for remote resource trust the graph you get by dereferencing the uri
> - for resources in the user-uri space trust that user

Of course what one thinks of this patch depends completely on how this provider 
gets used. It has quite a lot of limitations it seems to me as implemented
currently - which is of course not a final failing in a developing project, 
but it does seem like a good discussion would help us narrow perhaps on better
solutions or on fixes, which is part of the reason I thought the ISSUE should not
yet be closed. 

The failings I mentioned in the ISSUE-540, and develop below are:

1. it to relies on more and more Services that each require TcProviders. This
  feels very ad-hoc. The ones I mentioned are

    - UserManager: requires a TcManager 
    - WebIdGraphsService: also requires a TcManager 
    - PlatformConfig: requires TcManager 
    - ContentGraphProvider: requires TcManager 
    - TcManager 

   Each of these is used, and makes a call to the database. And the TcManager itself each
time 
iterates through a number of TcProviders.

2. when asking for an external URI, you get the whole content graph too
   On my fresh install of ZZ that is 20 times more information than the initial graph. 
How big is that going to become as one's content graph grows over time? Is this not going

to create a huge bottleneck very quickly? I thought I had heard people mention issues
with speed on this list. 

  So to verify this do the following:

zz> import org.apache.clerezza.platform.graphnodeprovider._
zz> val gnp = $[GraphNodeProvider]
zz> val tbl = gnp.get(new UriRef("http://www.w3.org/People/Berners-Lee/card#i"))
zz> tbl.getGraph.size
res0: Int = 1878

If I get Tim Berners Lee's Graph on the command line I find

$ rapper http://www.w3.org/People/Berners-Lee/card | wc
rapper: Parsing URI http://www.w3.org/People/Berners-Lee/card with parser rdfxml
rapper: Serializing with serializer ntriples
rapper: Parsing returned 78 triples
      78     380    9978

So here we have 78 triples, but the resulting answer schlepped around is 20 times bigger -
on a new installation!

I wondered what was going on, because to my surprise on a new installation the Content Graph
contains only 1 triple.
So I looked into a running instance of ContentGraphProvider and found that the additions array
contained the following graphs in addition to the content graph:

  - <urn:x-localinstance:/documentation.graph>   1002 triples
  - <urn:x-localinstance:/config.graph>           176 triples
  - <urn:x-localinstance:/web-resources.graph>    621 triples
  - <urn:x-localinstance:/enrichment.graph>         0 triples

So that does then indeed add up to the number.

What I am wondering is in what cases is this needed? It seems like this may
indeed what a particular application may require, but does it have to be 
a general service? The name certainly suggests a very general service, not
one required for a particular application.

Perhaps changing the name from GraphNodeProvider to ContentGraphPlusOtherProvider
would make more sense.

> This might not match an intuitive understanding of "authoritative" and I'm
> happy to redefine the issue so that no confusion arises.

One thing I am not quite clear about yet, is who writes to the content graph?
I see a lot of modules use it.

> 
> What I do strongly believe is that the proposed patch offers a major and
> very useful new functionality. Especially as it allows the following
> features to be implemented:
> - Thanks to CLEREZZA-544 one can call the render-method to delegate the
> rendering of resources with a UriRef instead of a resource,

I think you mean a "UriRef instead of a Graph".

Yes, that makes sense. But why does the GraphNodeProvider have to cast
such a wide net to catch so many triples? It seems to me that if one
is to use a URI then it would be better that the URI refer precisely to 
that named graph (or to a node it it). One could use other tools to create
virtual graphs, like Simon Schenk's Networked Graphs I mentioned

http://blogs.oracle.com/bblfish/entry/opening_sesame_with_networked_graphs

These allow one to have virtual graphs depending on a SPARQL query pattern.
There it would be easy for different services to specify different ones. 
And I think something like that would be really good to have.

> in this case the
> resource is rendered using its own baseGraph rather than the one of the
> calling template. An example usecase for this is rendering the author of a
> comment, the whole profile of the (possibly remote) commenter isn't and
> shall not be part of the baseGraph of the GraphNode returned by the jax-rs
> resource method, yet for rendering the comment-author infobox it might be
> beneficial to render a GarphNode with a baseGraph containing all of the
> information in the users profile-document

But why also all the information from the documentation and the config graphs?
It may be useful in some very limited cases, but it may mostly not be. It seems that
some use cases would be useful to help describe this in more detail. 

> - With CLEREZZA-541 the GraphNodeService is accessed from TypeHandler, I
> posted a resolution to this issue because it was already quite there on my
> local machine when Herny reopened CLEREZZA-540, to respect the reopening of
> the issue I didn't mark the dependent issue as resolved. I will of course
> revert the changes if requested to do so by a qualifying -1.
> 
> I'm not arguing that my patches solve all issues one might have around
> getting resource descriptions but I do think it is very valuable and to
> allow to base other stuff on this service I would like the issue to be
> closed. As Henry reopened the issue twice and I don't want to close the
> issue again without a broader discussion. Yet as many thing depend on the
> issue leaving it open doesn't seem an option to me.

What depends on it is something you are wanting to do in your projects it seems
to me, and that is not that clearly laid out. Because it does not seem obvious to
me why a service should make the decisions this one does about what is authoritative.

> 
> Future enhancement might include:

> - manually force refresh of caches for graphs related to a requested
> resource

Yes, indeed. But why here, when it is not in the WebProxy? You would think cache
update functionality should go in the WebProxy right? 



> - force an alternative set of baseGraphs to be used (e.g. Only local or only
> remote sources)

What I am wondering is why all this is done like this? If I go over the changes of the
past few weeks this is what I see:

So if we go over the history of refactorings that led us here.

1. You did not like the initial WebProxy you I wrote by refactoring your WebIdGraphsService.

   Neither did I in fact - but it did work  at least and added minimum change - being new
to ZZ 
  I did not want to play around too much in the internals.
2. You moved the old WebProxy to what seemed like a nicer interface: the TcProvider interface.
And 
   indeed that does look a lot better. BUT but this interface is really meant for direct,
no interpretation 
   access to the database and so lack
   - key notions of caching (well I suppose they could make sense even for other sesame or
jena graphs?)
   - does not provide a method for returning the final name of the graph (for redireted resources,
or foaf:knows), 
     when the WebProxy gets called
      (since this the TcProvider assumes you give it exactly the correct name of the graph)
   => So really it is quite uncomfortable there somehow.
3. This led you then to move to this GraphNodeProvider in order get a graph from a URI - which
is very similar
   to the TcProvider in many ways, right? It even uses the code of the original WebProxy to
do a HEAD 
   on a remote resource to find the graph name  (and which one would assume would be part
of the WebProxy 
   code since it  will be making the real HTTP Connection, and so can follow the changes of
the graph names.)
   But because the TcManager interface is really a database layer interface, that cannot be
placed there, and
   so is now placed into something outside - this class you have now written.
4. But instead of just having a GraphNodeProvider that just returns the graph, you have added
some twists to
   it and return more than jut the named graph. There is nothing to say that a named graph
cannot be the union
   of many other graphs, but it seems really arbitrary for me to get the documentation of
clerezza along with the 
   triples of Tim Berners Lee's graph.

   Somehow things have gone a bit haywire at the end here. And I think this is due to a bit
of confusion of the needs
   of your application with trying to keep the general architecture clean.

   Now on the whole I have learnt a lot about Clerezza by following this, but I just can't
say that this looks like
 a good long term solution.  We are constantly moving around and around something.

   Would any of the following work?

   - TcProvider extended to specify caching options? 
   - Graph to be extended so that it can contain its name (so that one can ask for a resource
in a TcProvider, 
    and find out what its name really was by inspecting the resulting graph)
    -> if not, should WebProxy really be a TcProvider? Since there is no way of knowing
ahead of time what the name
       of a graph for a resource is, given that redirects can occur at any time. 
   
    The WebProxy as TcProvider mostly makes sense otherwise, so it does feel like the above
two things would help.

> 
> So I'm asking you to kindly review the proposed code and vote about closing
> CLEREZZA-540
> 
> [ ] +1, I agree with accepting the proposed code into trunk
> [ ] 0, I don't care
> [ ] -1, I don't want this code in trunk (must specify a technical
> explanation, please also specify what would have to be changed for the patch
> to be acceptable to you.

-1 for the moment on closing the issue. (not on removing the code)
   Please answer the above points carefully.

Henry


> 
> Cheers,
> Reto

Social Web Architect
http://bblfish.net/


Mime
View raw message