clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Bachmann-Gmür <>
Subject Re: Might I be using Clerezza in the wrong way.
Date Fri, 08 Mar 2013 07:51:30 GMT
On Tue, Mar 5, 2013 at 12:14 PM, Minto van der Sluis <> wrote:

> Op 22-2-2013 19:32, Reto Bachmann-Gmür schreef:
> > Hi Minto,
> >
> > I think making the support for large number of named graphs more
> efficient
> > should be feasible quite easily, so if you want to stick to the current
> > design let's fix it.
> How is this best fixed. I can think of 2 approaches to get rid of the
> sets with graph names:
> 1) Rely on Jena Dataset.containsNamedModel(). This however does not
> discriminate between read-only/readwrite graphs. So something additional
> is required here.
> 2) Instead of the 2 sets use a single index graph to store the name and
> types (read-only/readwrite) of named graphs. Something like:
>     <graphName>  <graph.type> <read-only|readwrite>
Not sure why this would relevantly increase performance. Access to set is
supposed to be fast.

> Probably also TcProvider methods like getNames(), listGraphs(), ... need
> to return intelligent (lazy loading?) sets.


> Any thoughts or alternatives?
> > But I'm not really sure why you need one graph per tree, why can't you
> just
> > follow the relations to extract a graph from a single graph?
> I prefer multiple named graphs for a number of reasons:
> 1) Isolation: We use ontologies derived from OpenAnnotatation (OA). We
> not only store the annotation but also process related information along
> with the annotation. Al this information together we call an annotation
> tree. One of our requirements is to be able to retrieve a full isolated
> annotation tree.
> My feeling is that storing all annotation trees in a single graph will
> lead to annotation tree interference when annotation trees show partial
> overlap. For instance in OA terms when 2 trees have the same OA target
> In our current solution every annotation tree is stored in a separate
> named graph. Another separate graph is used as an index for all
> annotation trees (comparable to what Stanbol RuleStore does with recipes).
2) Ontology agnostic: We try to design our solution to be ontology
> agnostic at the storage level. In our opinion the solution we currently
> work on can be seen as a framework which can be used in multiple
> situations. This is mostly in situations where the process around
> annotations is slightly different. Some part definitely do need to know
> about the ontology being used, but if possible we like to keep the
> ontology away from lower level components.
> 3) Performance: If feel like retrieving a full named graph performs
> better than following relations. Of course  performance optimizations
> should be based on benchmarks and performance tests. But by using named
> graphs I also rely on Clerezza and Jena to take care of performance
> optimizations instead of coding my own relations walker.

Ok, I see the benefits. Especioally the one of the global index and the
individual named graphs for faster retrieving. What you describe as
ontology agnosticism I think is about the same as my criteria of different
origin of the data.

> On the downside our queries probably are going to look more complex. The
> queries need to take the named graphs and named graph index into account.
> > In general I think that its a design smell if multiple graphs are used
> for
> > semantic reason. RDF is powerful to describe a universe in one graph. In
> my
> > opinion multiple graphs should be used if they either have different
> > origins (i.e. the represent the universe from different observers) or
> they
> > have different access control setting (e.g. public/confidential/private).
> > As there are many origins of graphs having support for large number of
> > graphs in clerezza would still be nice.
> I would like to stick with my current design unless there is something
> fundamentally wrong with it. :-)
Good, then let's optimize the clerezza perfomance. I think the Sparql
fastlane is an important step.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message