incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Story <henry.st...@bblfish.net>
Subject Re: How to name things with URIs
Date Sun, 15 May 2011 17:38:05 GMT

On 15 May 2011, at 12:11, Reto Bachmann-Gmuer wrote:

> On Sun, May 15, 2011 at 11:28 AM, Henry Story <henry.story@bblfish.net> wrote:
>> And we did boil things down to the question of why we need
>> 
>> urn:x-localinstance:/cache/<remote-uri>
>> 
>> rather than just
>> 
>> <remote-uri>
>> 
>> We both agree that either is an improvement over <remote-uri>.cache
> 
> no, we just both agree that <remote-uri>.cache is ugly.

You don't agree that this can lead to name clashes? 

I presented the possibility of a foreign url being named

   http://amazon.com/books/on.cache 

For example. 

Is that perhaps not the most important role of URIs, to allow a global
distributed naming system that does not have a problem with name clashes?

So my point against .cache is that it is worse than ugly: it is broken.

Anyway urn:x-localinstance:/cache/<remote-uri> does not have that problem.

The issue is what it brings that <remote-uri> alone does not.

> But imho <remote-uri> isn't practicable (at least as long as we want to keep
> this in the rdf.*-bundle set and not have this be part of platform.*).

That is something you have not explained yet.

What this is revealing is that the reason you are against what seems like the
intuitively reasonable naming choice have to do with implementations decisions
made in Clerezza and ones you were about to make (turning the cache into a 
TcProvider).  Clarifying this does indeed help advance the debate, because
I can now understand where you are coming from more clearly.

> 
>> Ok, so this issue is occurring because you are refactoring WebProxy to be a
>> TcProvider, which it was not originally. One can of course see the pull to
>> making  it a TcProvider, though perhaps the delete methods are not so useful
>> there.
> Virtual TcProvider typically don't implement these. I think we shall
> have good reasons to extends the publi api. With my suggestion we
> extend it by just one class, one public member and one enumeration.
> With you proposal - after i reduced visibility of some members - there
> are still 2 new public classes with 6 new public members and one
> enumeration. For the public APIs change and extensions we accept into
> trunk we should be more conservative than just for implementation,
> especially for a core component like this Proxy.

As I said, I see the pull to making it a TcProvider, ie to use the interface you
use for accessing all graphs. 

But there are some things I was looking at that may be important to consider.

In the WebProxy I do return a Resource object that comes with information about the graph.
So it is a bit more than a graph node. It is a graph containing a graph node: a GraphNodeGraph
perhaps; Metadata about a resource, including the graph. I was thinking this would be useful
to be able to work out what an issue with a remote graph was, if one should force fetch it
again...
It is something that can be very useful to know: a WebID in your foaf file is no longer accessible,

should it be removed from the foaf? A resource was last fetched a week ago, but it is important
in the current transaction, one should perhaps fetch it again: the information may be out
of date.
Two graphs content clash, which one is freshest? Who said what? What was the previous version
of this graph? (Oh, it's completely different?! Mhh...). Which application created it? (We
deleted that did we not? Perhaps it's graphs should go too)

I was hoping to explore that a bit further to see where it leads to. But it will of course
also
make sense for a TcProvider too. What do you think?

Compared to many other implementations of TcProvider this one has relatively few methods
that throw a NotImplementedException: createMGraph, deleteTripleCollection and here
it even makes sense to deleteTripleCollection . And it may even be possible to envisage it
making
sense to create one in the longer term....

> 
>>> So you're welcome to make suggestion on how it
>>> should be different,
>> 
>> I have not really had time to study all your local TcProviders, nor
>> work out how a number can help anything distinguish between one TcProvider
>> and another. I have to go right now - my sister is calling...
>> 
>> But here is a quick question: why not simply make the ProxyTcProvider a higher priority
than the
>> pure local one?
> We wouldn't know when to try to dereference a graph and when not.

Any Resource that is not local, that can be dereferenced (http, https, ftp, ftps,...) (that
is not on some blacklist) can be dereferenced. 

Any resources that are local (localhost domain, or the name of the current machine
is running on) should be passed to the next TcProvider. 

So my feeling here is that TcProviders should be arranged a bit like a Servlet Filter, each
one does something, if it cannot do it, it passes it on to the next one in the stack. Then
one can build stacks of Providers. (I have only looked at 3 or four implementations of TcProvider).
(Perhaps a simple stack is too simple...)

> I propose to name our default graphs with localinstance-uris but it
> shall still be possible to store a triplecollection with any name in
> TcManager (again doing the least possible change to the behaviour of
> the public api). Thus we wouldn't know for a graph named with a
> dereferenceable uri if we should try to update it of if the local copy
> is authoritative (and trying to update would generate a circle).

Ok so what you want is in fact something you had not explained before.
You want remote graphs that should not be fetched, say foaf, or owl to
be at their namesepace in the TcManager. So the name of the foaf graph should
be  

   http://xmlns.com/foaf/0.1/ [ mhh foaf is a difficult one ]

the cert graph should be named

   http://www.w3.org/ns/auth/cert#

the owl graph should be named 

    http://www.w3.org/2002/07/owl

Anyway, these should be looked at carefully, because we can get good feedback from the semweb
community on how those graphs should be called.

Then all graphs that are fetched through the cache should be called

  urn:x-localinstance:/cache/<remote-uri>

Because then you can have a simple fall through logic of TcManagers, where if it is not found
in the  first TcManager, the next WebProxy TcManager can fetch it.

Is that it?

>> 
>>> but without other proposal and without you
>>> withdrawing the -1 I have to change it to
>>> name.getUnicodeString+".cache" which was the last (silently) accepted
>>> name. I think we both agree that localinstance is better than the
>>> .cache proposal, so I urge you to revoke your vote.
> Above you confirmed that you agree that
> urn:x-localinstance:/cache/<remote-uri> is better than
> <remote-uri>.cache so I don't understand why you're still vetoing this
> change.

This is a very important discussion I think. We are progressing.

Henry


> 
> Reto
> 
> 
>>> 
>>> Cheers,
>>> Reto
>>> 
>>> On Sun, May 15, 2011 at 12:17 AM, Henry Story <henry.story@bblfish.net>
wrote:
>>>> I would like you first to read through the extensive mail I wrote, which
took
>>>> me some time to write, and think things through.
>>>> 
>>>> 
>>>> Henry
>>>> 
>>>> On 14 May 2011, at 22:37, Reto Bachmann-Gmuer wrote:
>>>> 
>>>>> On Sat, May 14, 2011 at 7:54 PM, Henry Story <henry.story@bblfish.net>
wrote:
>>>>>> Btw, I suppose I should say that I am not massively against the suggestion
>>>>>> you started this thread with. It is more than I am trying to explore
this
>>>>>> more carefully, because it is an important discussion that deserves
careful
>>>>>> thought.
>>>>> The careful procedure is to have tiny little issues which when
>>>>> resolved bring a tiny but undisputed improvement. Now with your
>>>>> resolution of CLEREZZA-463 I'm having massive problems and even if you
>>>>> think the status quo ante was fundamentally wrong I believe the
>>>>> graph-renaming you did makes things worse.
>>>>> 
>>>>> I know that CLEREZZA-463 contains many real improvement. But it also
>>>>> introduce problems. And not just what you might consider a
>>>>> philosophical problem that names denote extensionally different things
>>>>> but also very practical ones.
>>>>> 
>>>>> One major problem is the permission.  We introduces
>>>>> WebIdBasedPermissionProvider and one implementation
>>>>> (UserGraphAcessPermissionProvider) used to provide readwrite access to
>>>>> the profile graph. Now this no longer works because you changed the
>>>>> names of graphs. Because of this and not because of a fundamentally
>>>>> broken architecture before your patch applications that used to work.
>>>>> 
>>>>> Your -1 was against urn:x-localinstance:/cache/<remote-uri>
>>>>> 
>>>>> The status quo ante was
>>>>> 
>>>>> cache graph: <web-profile-uri>.cache
>>>>> profie-graph: <web-profile-uri>
>>>>> 
>>>>> with the resolution of  CLEREZZA-463 we have
>>>>> 
>>>>> cache graph <web-profile-uri>
>>>>> profile graphs for local users: <web-profile-uri>
>>>>> profile graphs for remote users: <default-base-uri>/<web-profile-uri>
>>>>> 
>>>>> you did change some names, probably just because of inconsistent
>>>>> changes things broke (UserGraphAcessPermissionProvider seems pointless
>>>>> right now). I don't want to
>>>>> 
>>>>> and  such that because of the renaming of graphs the
>>>>> UserGraphAcessPermissionProvider
>>>>> 
>>>>> - The user has no longer the right to write to its own graph
>>>>> - Because the user graphs that is now (with your resolution of
>>>>> CLEREZZA-463) named like
>>>>> <http://localhost:8080/user/https://farewellutopia.com/user/me/profile>
>>>>> 
>>>>> In my opinion to changed a suboptimal solution against quite a mess,
>>>>> now you argue against my solution to tidy things up because you are
>>>>> afraid of having a mess in one year.
>>>>> 
>>>>> So please either accept my proposal which started this thread as
>>>>> something that is better than the status quo (i.e. retract your -1 so
>>>>> I can finally go back coding) or make a concrete proposal on how to
>>>>> name the different entities I've been suggesting names for or else
>>>>> revert the changes for CLEREZZA-463 (so that applications that used to
>>>>> work work again and we can start a proper development with little
>>>>> issues and patches that represent undisputed improvements.
>>>>> 
>>>>> 
>>>>> ==== what I consider important and relevant to current development
>>>>> ends here ====
>>>>> 
>>>>>> 
>>>>>> On 14 May 2011, at 17:09, Reto Bachmann-Gmuer wrote:
>>>>>> 
>>>>>>> On Fri, May 13, 2011 at 5:46 PM, Henry Story <henry.story@bblfish.net>
wrote:
>>>>>>>> Reto wrote:
>>>>>>>>> Clerrezza-489 and you also quote may statement of 463.
okay, you might say
>>>>>>>>> that I'm stating rather than arguing.
>>>>>>>> :-)
>>>>>>>>> The argument: they are different thing, both intensionally
(cache and
>>>>>>>>> source) as in many case extensionally (triples may differ).
>>>>>>>> 
>>>>>>>> in that sense I agree.
>>>>>>>> But then the other point I made is also true, and that is
that different
>>>>>>>> users may get different
>>>>>>>> graphs back for the same remote resource. In fact those users
may be the
>>>>>>>> same user at different times.  Since those are all different
graphs by your
>>>>>>>> definition above one should also give them different names.
>>>>>>> We do not have support for this yet and I think its a feature
>>>>>>> increasing complexity massively.
>>>>>> 
>>>>>> You are dealing with an architectural problem which cannot just be
dealt
>>>>>> with in stages. You need to look at the problem as a whole, or you
will
>>>>>> just end up with the problem we are having right now. It is better
to get this
>>>>>> issue cleared up now, than have a mess of graph names in one year,
when a lot of
>>>>>> applications depend on this.
>>>>> This kind of against agile mantras and it seems to contrast very
>>>>> strongly to what you just did: you changed the names and now want a
>>>>> scientific study to change them again to solve the problems your
>>>>> namechange introduced.
>>>>> 
>>>>>> 
>>>>>> In any case it's not increasing anything massively, it is the logical
>>>>>> continuation of your point above.
>>>>> If you propose a patch which changes names and deliver good arguments
>>>>> why the new names are massively better and support future usecases
>>>>> without any disadvantage for addressing the current usecases than I'm
>>>>> sure this gets accepted, what you did is mix-in this namechange in a
>>>>> whole bunch of patches.
>>>>> 
>>>>>> 
>>>>>> Your argument was:
>>>>>> 
>>>>>> "they [the remote and the locally fetched graph] are different thing,
both
>>>>>> intensionally (cache and source) as in many case extensionally (triples
may differ)."
>>>>>> 
>>>>>> And so it follows that graphs sent at different times may also differ
>>>>>> extensionally and should have different names too.
>>>>> 
>>>>> No, we are talking about MGraphs here. I know transtemporal identity
>>>>> is a hard problem philosophically yet in practice we have quite strong
>>>>> intuition on what we consider to be the same thing over time. the
>>>>> google website remains the google website even if they change the
>>>>> design, same goes for the wikipedia page about google it remains the
>>>>> wikipedia site about google (with the same URI) even after it was
>>>>> changed, one never becomes the other.
>>>>> 
>>>>>> 
>>>>>> You can't have it both ways, argue on intentionality for different
names and then
>>>>>> refuse to see that temporally different graphs would also then need
different names.
>>>>> I was talking about intensionality. Two terms have a same intension
>>>>> only is in the same universe of evaluation and at the same point in
>>>>> time they have the same extension.
>>>>> 
>>>>>> 
>>>>>> ( Btw. there are good arguments that intentionally the local graph
if it is a cache
>>>>>> does not differ from the remote one. In any case if you pursue this
too far you will
>>>>>> find that you can never name any remote thing. )
>>>>>> 
>>>>>>> I don't think that clerezza-490 need to be resolved urgently,
but anyway we
>>>>>>> should proceed issue by issue, and the best resolution of an
issue is a minimal
>>>>>>> resolution not one that tries to foresee and future issues.
>>>>>> 
>>>>>> I tend to see logical consequences of an argument as being contained
in the argument,
>>>>>> and not being future issues that can be looked at later as somehow
being distinct.
>>>>> yes, but:
>>>>> 1. analysing till the very bottom inevitably leads to paralysis.
>>>>> 2. this inconsistent with your intuition based named change without discussion
>>>>> 3. We have problems needing a fix (only to be as good as before your
>>>>> patches) and you're not making a concrete proposal
>>>>> 
>>>>>> 
>>>>>> Clerezza-490 that deals with different ways the server can present
itself to other
>>>>>> servers, is not of course something that needs to be implemented
immediately. But it
>>>>>> would be good that the naming solution we come up with can be extended
to that case
>>>>>> and to the temporal case.
>>>>>> 
>>>>>> So I am invoking Clerezza-490 as something to help test the naming
ideas being put
>>>>>> forward here. This is a logical test if you will.
>>>>> see above
>>>>> 
>>>>>> 
>>>>>>>> So local graph naming schemes should take that into account,
which is why I
>>>>>>>> suggest that we have an API that can allow for extensibility
here.
>>>>>>> We have currently things and we are naming them badly.
>>>>>>> 
>>>>>>> Prior to you r webproxy we had:
>>>>>>> <webid-profile-url>.cache as name for the cache of the
webprofile
>>>>>>> and
>>>>>>> <webid-profile-url> as uri for triples the user generated
locally,
>>>>>>> this can be seen as extensions to the remote profile with information
>>>>>>> (like preferred language) that happen not to be in the remote
profile
>>>>>>> 
>>>>>>> which was consistent with local users who only had
>>>>>>> <webid-profile-url> for the triples they control which
include both
>>>>>>> the regular profile as well
>>>>>> 
>>>>>> yes, and both of those were not good solutions.
>>>>>> The .cache solution is bound to create a problem if someone remotely
has
>>>>>> a URI named http://some.example/resource.cache
>>>>>> 
>>>>>> It is bound to lead to nasty name clashes, with the same URI naming
two different things.
>>>>> right, I'm admitting it wasn't ideal - but I preffere the seldom
>>>>> clashes to the ambiguity by design.
>>>>> 
>>>>>> 
>>>>>> Remote URIs are named by remote resources, so it makes more sense
to use the URI of the
>>>>>> remote resource to name the graph of the remote resource. The remote
resource was named
>>>>>> by the owner of the resource. We should respect that.
>>>>> <sarcasm>so we nshould not do caching, as the uri prefix http implies
>>>>> a preferred method for retrieving the resource which is definitively
>>>>> different than getting it out of a local tdb store</sarcasm>
>>>>> 
>>>>>> 
>>>>>> If there are local additions to a remote graph, they should be given
a local
>>>>>> URI. There is nothing simpler than this solution it seems to me.
>>>>>> 
>>>>>>> 
>>>>>>> Now <webid-profile-url> is the cache,
>>>>>> 
>>>>>> You can look at it that way, or you can think of it as the name of
the remote
>>>>>> graph, with the contents being the cache of the remote graph.
>>>>>> 
>>>>>> If you were to make the local graph available publicly, it would
then of
>>>>>> course need to have a local url tied into your namespace. Perhaps
this is a good
>>>>>> way to think of the distinction.
>>>>> 
>>>>> I'm noty saying your proposal is absurd, but you introduced in a way
>>>>> that breaks things an without discussion. now that I want to clean the
>>>>> mess you start writing socio-philosophical essays
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> not sure where additional
>>>>>>> triples added locally get stored, i.e. where triples added to
>>>>>>> webIdGraphsService.getWebIDInfo(webId).publicUserGraph are stored.
>>>>>> 
>>>>>> 
>>>>>> They should be stored in graph names with a local URL clearly since
these are being stored by a local agent. And I think it will be application specific what
the names of those graphs should be.
>>>>>> 
>>>>>> So currently as an initial proposal I put them in
>>>>> as a proposal ok, but you changed something that was working without
>>>>> dissusing the consequences this e.g. for permissions.
>>>>> 
>>>>> <snip/>
>>>>>> Now imagine there are 2 or 3 applications on a clerezza instance,
that a remote user  with his WebID uses.  There is no reason these applications should be
putting all the information they generate for that user in the same local graph.
>>>>>> 
>>>>>> A banking graph should put banking info in its graph and a blogging
graph into  its graph. The way to do this is to give applications - like users - access to
 namespaces. Perhaps the bank application that was given control of the /bank namespace could
coin graphs for remote users in that space, eg /bank/id/{remoteWebID} and the blogging one
in /blog/id?{remoteWebID} .
>>>>>> 
>>>>>> By giving apps access to name spaces you can also make sure that
there won't be any clashes.
>>>>> there is nothing that prevent application from making there own graphs
>>>>> for user information.
>>>>> 
>>>>>> 
>>>>>> now, that could be a reason for having URIs like
>>>>>> 
>>>>>> mvn:/dev.net/application1/?user=webid...
>>>>>> 
>>>>>> But then you see that applications on different servers will have
name clashes too if they
>>>>>> ever merge their databases.
>>>>>> 
>>>>>> The advantage of using the local published name is that this then
would allow simple dumps of databases and their merging in remote databases without clashes.
>>>>>> 
>>>>>>> I'm not saying the old naming was perfect but it worked in a
somehow
>>>>>>> consistent fashion for local and remote users.
>>>>>> 
>>>>>> It was very confusing to me at least, as I point out in CLEREZZA-489.
>>>>>> 
>>>>>> And it furthermore is inconsistent with your point above that remote
graphs are
>>>>>> intentionally different from the local version.
>>>>>> 
>>>>>>> Now my application taht used this feature is now longer working.
>>>>>> 
>>>>>> Well that is the problem of having an initial system that is broken.
>>>>>> It will be easy to fix this, and we should fix it well, not do a
half job of it,
>>>>>> because this is a distributed naming problem.
>>>>> I'm tired. I've nothing against a concrete counter proposal against
>>>>> the one that started the tread, e.g. saying: "we must give every
>>>>> instance a unique-id and this should be part of the
>>>>> x-localinstance-uri"
>>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> in Clerezza-489 I wrote that one could describe each graph
like this in a
>>>>>>>> special Cache graph perhaps.
>>>>>>>> :g202323 a :Graph;
>>>>>>>>     = { ... };
>>>>>>>>     :fetchedFrom <https://remote.com/&gt;;
>>>>>>>>     :fetchedBy <http://bblfish.net/people/henry/card#me&gt;;
>>>>>>>>     :representation <file:/tmp/repr/202323>;
>>>>>>>>     :httpMeta [ etag "sdfsdfsddfs";
>>>>>>>>                      validTo "2012...."^^xsd:dateTime;
>>>>>>>>                     ... redirected info?
>>>>>>>>                     ] .
>>>>>>>> 
>>>>>>>> :g202324 a :Graph;
>>>>>>>>     = { ... };
>>>>>>>>     :fetchedFrom <https://remote.com/&gt;;
>>>>>>>>     :fetchedBy <http://farewellutopia.com/reto#me&gt;;
>>>>>>>>     :representation <file:/tmp/repr/202324>;
>>>>>>>>     :httpMeta [ etag "ddfsdfsddfd";
>>>>>>>>                      validTo "2012...."^^xsd:dateTime;
>>>>>>>>                     ... redirected info?
>>>>>>>>                     ] .
>>>>>>> 
>>>>>>> If we had barketing in RDF and our tooling would support it the
the
>>>>>>> above might be somehow topical, answer to the question "how to
name
>>>>>>> this?" "don't name it".
>>>>>> 
>>>>>> The above is just a way of writing the contents of the graph and
the metadata
>>>>>> in the same file.  That is what the
>>>>>> 
>>>>>>  :g202323 = { ... }
>>>>>> 
>>>>>> is about. You don't need any special tools for that. If you use Jena
to get the graph
>>>>>> named above you would get the content of the brackets. The point
is that the content
>>>>>> from
>>>>> Also in jena  the graphs have a name, very profane sequence of
>>>>> characters this discussion was about. So in clerezza of in jena in the
>>>>> metadata graph you have a name instead of {...} and for this name you
>>>>> will get {...} from the named graph store.
>>>>> 
>>>>>> 
>>>>>>  :fetchedFrom ..
>>>>>>  :fetchedBy ...
>>>>>> 
>>>>>> is not in the g202323 graph, but in a graph metadata graph.
>>>>> obviously
>>>>>> 
>>>>>>> Please lets proceed issue by issue and make
>>>>>>> sure every brick we place is really solid and separate this from
>>>>>>> visionary long term stuff.
>>>>>> 
>>>>>> Ok, I hope you see that I introduced nothing new there. It's just
an
>>>>>> n3 notation that makes it easy to write things out in an e-mail.
>>>>> an n3 notaions that omits exactly what this discussion is about,
>>>>> namely my nameing proposal and your -1 gainst it.
>>>>> 
>>>>>> 
>>>>>> So please consider that point again in that light.
>>>>>> 
>>>>>>>> 
>>>>>>>> Then this API could use information from this graph to and
information from
>>>>>>>> the user's request
>>>>>>>> to find the correct local graph he wants.
>>>>>>> Still the local graph would have a name, probably - but as I
said its
>>>>>>> irrelevant. Lets deal with the issues at hand, you changed the
names
>>>>>>> of graph (which I agree didn't have the best possible names)
with
>>>>>>> names that I think are worse, lets find something we can agree
upon.
>>>>>>> (otherwise, please roll back to the version with the orginal
names
>>>>>>> till we find a consensus).
>>>>>> 
>>>>>> Well I don't think rolling back would improve anything. I think clearly
>>>>>> this was an improvement. But I do think we can do better.
>>>>> It a mixture between improvements and deterioration. following the
>>>>> right process avoids the deterioations
>>>>> 
>>>>> 
>>>>>> 
>>>>>> So my thinking is that to reach consensus we can do this with an
API, without
>>>>>> deciding what precisely the names should be.
>>>>> Stop: I disagree with your new names and we have problems because of
>>>>> your name changes and now you dont want to decide about names?!
>>>>> 
>>>>>> The best is just to lay out the
>>>>>> requirements:
>>>>>> 
>>>>>>  1. mapping from a remote URI to the URI understood by the local
triple store
>>>>>>   and back. There should be no name clashes. It should be possible
to easily extend
>>>>>>   to have agent views and temporal views.
>>>>>> 
>>>>>>  2. method for applications to take hold of legitimate namespaces
in such a way that
>>>>>>    a clash of names is not possible.
>>>>> 
>>>>> If any proposal for changing names satisfies one of your criteria less
>>>>> than the staus before the poposal your applying the argument to the
>>>>> concrete proposal is welcome.
>>>>> 
>>>>> Reto
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Henry
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Reto
>>>>>>> 
>>>>>>>> Henry
>>>>>>>> PS. Having said that one then may just wonder why local graphs
should ever
>>>>>>>> have anything other than
>>>>>>>> local URLs, since every time someone made a copy of a local
graph it would
>>>>>>>> be different.
>>>>>> 
>>>>>> Social Web Architect
>>>>>> http://bblfish.net/
>>>>>> 
>>>>>> 
>>>> 
>>>> Social Web Architect
>>>> http://bblfish.net/
>>>> 
>>>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
>> 

Social Web Architect
http://bblfish.net/


Mime
View raw message