Return-Path: X-Original-To: apmail-incubator-clerezza-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-clerezza-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 661837255 for ; Wed, 26 Oct 2011 13:37:49 +0000 (UTC) Received: (qmail 28748 invoked by uid 500); 26 Oct 2011 13:37:49 -0000 Delivered-To: apmail-incubator-clerezza-dev-archive@incubator.apache.org Received: (qmail 28676 invoked by uid 500); 26 Oct 2011 13:37:49 -0000 Mailing-List: contact clerezza-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: clerezza-dev@incubator.apache.org Delivered-To: mailing list clerezza-dev@incubator.apache.org Received: (qmail 28668 invoked by uid 99); 26 Oct 2011 13:37:49 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 13:37:49 +0000 Received: from localhost (HELO mail-yw0-f47.google.com) (127.0.0.1) (smtp-auth username dspicar, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 13:37:48 +0000 Received: by ywf9 with SMTP id 9so1508052ywf.6 for ; Wed, 26 Oct 2011 06:37:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.116.9 with SMTP id f9mr47416551yhh.0.1319636267942; Wed, 26 Oct 2011 06:37:47 -0700 (PDT) Received: by 10.236.103.34 with HTTP; Wed, 26 Oct 2011 06:37:47 -0700 (PDT) X-Originating-IP: [212.55.219.194] In-Reply-To: References: Date: Wed, 26 Oct 2011 15:37:47 +0200 Message-ID: Subject: Re: Weak Performance of "application/json+rdf" serializer on big TripleCollections (CLEREZZA-643) From: Daniel Spicar To: clerezza-dev Content-Type: multipart/alternative; boundary=485b397dccfbfbc23604b033c207 --485b397dccfbfbc23604b033c207 Content-Type: text/plain; charset=ISO-8859-1 the JIRA issue can be found here: https://issues.apache.org/jira/browse/CLEREZZA-643 On Wed, Oct 26, 2011 at 3:36 PM, Daniel Spicar wrote: > Rupert provided a patch to improve serialization performance (thanks for > the effort!). I reviewed his Patch and have written my comments on the JIRA > page. But I think we need to discuss the issues I raise there. In summary: > > - neither the patch nor the current implementations work reliably with very > large graphs (larger than memeory) > - the patch is significantly faster than the current implementation > - the current implementation is easier to quick-fix for very large graphs > (but also very slow) > > There is a sketch of a better solution that should allow us to be faster > and not limited by memory size. It is based on sorted iterators. However > these iterators need to be supplied by the underlying TripleCollections and > that will require more changes to the core of Clerezza. > > Because both, the current implementation and the patch doe not really work > on "big" TripleCollection (when big means really really big) the question we > should discuss its: > a) keep everything as it is and solve the problem properly (possibly as > described in the issue) > b) quick fix the current implementation (slow performance) + schedule a > proper solution > c) apply the patch (fast but graphs limited to available memory size) + > schedule a proper solution > > My favorite is c. > > What do you think? > --485b397dccfbfbc23604b033c207--