incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Bachmann-Gmür <>
Subject Re: Might I be using Clerezza in the wrong way.
Date Fri, 22 Feb 2013 18:32:05 GMT
Hi Minto,

I think making the support for large number of named graphs more efficient
should be feasible quite easily, so if you want to stick to the current
design let's fix it.

But I'm not really sure why you need one graph per tree, why can't you just
follow the relations to extract a graph from a single graph?

In general I think that its a design smell if multiple graphs are used for
semantic reason. RDF is powerful to describe a universe in one graph. In my
opinion multiple graphs should be used if they either have different
origins (i.e. the represent the universe from different observers) or they
have different access control setting (e.g. public/confidential/private).
As there are many origins of graphs having support for large number of
graphs in clerezza would still be nice.


On Fri, Feb 22, 2013 at 12:52 PM, Minto van der Sluis <> wrote:

> Hi folks,
> I am starting to wonder if I use Clerezza and semantic technology in
> general in the wrong way. To make it more clear I will first describe my
> situation and then why I think I might be using it incorrectly.
> Context
> =======
> Basically we are gathering and distributing annotations. For this we
> make use of OpenAnnotation (OA, see [1]).  Since OA is based on RDF we
> were looking for products capable of storing this data. We decided to
> use Clerezza as an abstraction for the actual storage layer. Like this
> we are able can switch storage engines quite easily.
> Now it turns out that our annotations should support annotations on
> annotations. Amongst others this is to be able to tell if a root
> annotation has been properly processed or rejected (status change). This
> leads us to the notion of annotation trees. Every one of these trees
> starts with a single annotation as the root/trunk.
> The system we work on not only stores annotation but also has to return
> complete annotation trees. For this reason we decided to store every
> tree in its own named graphs. Like this we can easily retrieve a full
> tree by returning the complete named graph. The downside of it well be
> that we will end up with a massive number of (small) named graphs.
> For the storage we decide (for the time being) to use
> SingleTdbDatasetTcProvider. Here also lies the root cause why I started
> wondering if we are on the right track. Looking at the
> SingleTdbDatasetTcProvider implementation I have the following
> observations:
> Observations
> ===========
> 1) SingleTdbDatasetTcProvider keeps names of graphs in 2 separate sets.
> This does not seem to be very efficient for large amounts of graphnames
> (100k+ or possible 1m+).
>     private Set<UriRef> graphNames;
>     private Set<UriRef> mGraphNames;
> 2) All graphnames are logged on startup (activation). This is feasible
> for a small number, but not for a rather large number of named graphs.
> 3) FileTcProvider ( also keeps names in memory.
>     private Map<UriRef, FileMGraph> uriRef2MGraphMap =
>             new HashMap<UriRef, FileMGraph>();
> 4) SesameNativeWeightedProvider () keeps not only the names in memory,
> but the graph objects as well.
>     private HashMap<UriRef, SesameMGraph> mGraphs;
>     private HashMap<UriRef, SesameGraph> graphs;
> Are we approaching this incorrectly or are we running into limitations
> of the current implementation? In other words is a large number of named
> graphs supported or isn't Clerezza and maybe even semantic technology in
> general designed for this?
> Any thoughts?
> Regards,
> Minto
> --
> ir. ing. Minto van der Sluis
> Software innovator / renovator
> Xup BV
> [1]

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message