incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Spicar <dspi...@apache.org>
Subject Re: Weak Performance of "application/json+rdf" serializer on big TripleCollections (CLEREZZA-643)
Date Wed, 26 Oct 2011 13:37:47 GMT
the JIRA issue can be found here:
https://issues.apache.org/jira/browse/CLEREZZA-643

On Wed, Oct 26, 2011 at 3:36 PM, Daniel Spicar <dspicar@apache.org> wrote:

> Rupert provided a patch to improve serialization performance (thanks for
> the effort!). I reviewed his Patch and have written my comments on the JIRA
> page. But I think we need to discuss the issues I raise there. In summary:
>
> - neither the patch nor the current implementations work reliably with very
> large graphs (larger than memeory)
> - the patch is significantly faster than the current implementation
> - the current implementation is easier to quick-fix for very large graphs
> (but also very slow)
>
> There is a sketch of a better solution that should allow us to be faster
> and not limited by memory size. It is based on sorted iterators. However
> these iterators need to be supplied by the underlying TripleCollections and
> that will require more changes to the core of Clerezza.
>
> Because both, the current implementation and the patch doe not really work
> on "big" TripleCollection (when big means really really big) the question we
> should discuss its:
> a) keep everything as it is and solve the problem properly (possibly as
> described in the issue)
> b) quick fix the current implementation (slow performance)  + schedule a
> proper solution
> c) apply the patch (fast but graphs limited to available memory size) +
> schedule a proper solution
>
> My favorite is c.
>
> What do you think?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message