incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hasan Hasan <ha...@trialox.org>
Subject Re: Weak Performance of "application/json+rdf" serializer on big TripleCollections (CLEREZZA-643)
Date Mon, 31 Oct 2011 08:14:13 GMT
Since the performance gain reported is quite big, and there seems to be no
other implications beside increasing code complexity, I would also go for
option c.
I am not sure here whether we need to call for a vote.

regards
hasan

On Wed, Oct 26, 2011 at 5:30 PM, Tsuyoshi Ito <tsuy.ito@trialox.org> wrote:

> I prefer also option C
>
> Cheers
> Tsuy
>
> On Oct 26, 2011, at 5:14 PM, Tommaso Teofili wrote:
>
> > same here; I'd go with C option :)
> > Tommaso
> >
> > 2011/10/26 Daniel Spicar <dspicar@apache.org>
> >
> >> the JIRA issue can be found here:
> >> https://issues.apache.org/jira/browse/CLEREZZA-643
> >>
> >> On Wed, Oct 26, 2011 at 3:36 PM, Daniel Spicar <dspicar@apache.org>
> wrote:
> >>
> >>> Rupert provided a patch to improve serialization performance (thanks
> for
> >>> the effort!). I reviewed his Patch and have written my comments on the
> >> JIRA
> >>> page. But I think we need to discuss the issues I raise there. In
> >> summary:
> >>>
> >>> - neither the patch nor the current implementations work reliably with
> >> very
> >>> large graphs (larger than memeory)
> >>> - the patch is significantly faster than the current implementation
> >>> - the current implementation is easier to quick-fix for very large
> graphs
> >>> (but also very slow)
> >>>
> >>> There is a sketch of a better solution that should allow us to be
> faster
> >>> and not limited by memory size. It is based on sorted iterators.
> However
> >>> these iterators need to be supplied by the underlying TripleCollections
> >> and
> >>> that will require more changes to the core of Clerezza.
> >>>
> >>> Because both, the current implementation and the patch doe not really
> >> work
> >>> on "big" TripleCollection (when big means really really big) the
> question
> >> we
> >>> should discuss its:
> >>> a) keep everything as it is and solve the problem properly (possibly as
> >>> described in the issue)
> >>> b) quick fix the current implementation (slow performance)  + schedule
> a
> >>> proper solution
> >>> c) apply the patch (fast but graphs limited to available memory size) +
> >>> schedule a proper solution
> >>>
> >>> My favorite is c.
> >>>
> >>> What do you think?
> >>>
> >>
>
> --trialox ag-------------------------------------
>   tsuyoshi ito
>  hardturmstrasse 101
>  8005 zuerich
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message