incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Spicar <dspi...@apache.org>
Subject RE: Weak Performance of "application/json+rdf" serializer on big TripleCollections (CLEREZZA-643)
Date Wed, 26 Oct 2011 13:36:33 GMT
Rupert provided a patch to improve serialization performance (thanks for the
effort!). I reviewed his Patch and have written my comments on the JIRA
page. But I think we need to discuss the issues I raise there. In summary:

- neither the patch nor the current implementations work reliably with very
large graphs (larger than memeory)
- the patch is significantly faster than the current implementation
- the current implementation is easier to quick-fix for very large graphs
(but also very slow)

There is a sketch of a better solution that should allow us to be faster and
not limited by memory size. It is based on sorted iterators. However these
iterators need to be supplied by the underlying TripleCollections and that
will require more changes to the core of Clerezza.

Because both, the current implementation and the patch doe not really work
on "big" TripleCollection (when big means really really big) the question we
should discuss its:
a) keep everything as it is and solve the problem properly (possibly as
described in the issue)
b) quick fix the current implementation (slow performance)  + schedule a
proper solution
c) apply the patch (fast but graphs limited to available memory size) +
schedule a proper solution

My favorite is c.

What do you think?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message