Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97CAA7708 for ; Mon, 10 Oct 2011 11:11:35 +0000 (UTC) Received: (qmail 17504 invoked by uid 500); 10 Oct 2011 11:11:35 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 17478 invoked by uid 500); 10 Oct 2011 11:11:35 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 17470 invoked by uid 99); 10 Oct 2011 11:11:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 11:11:35 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of castagna.lists@googlemail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-ww0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 11:11:29 +0000 Received: by wwf27 with SMTP id 27so8812515wwf.0 for ; Mon, 10 Oct 2011 04:11:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=4EiYbzxxouTj9eqSxDet/FTGWCXeknQqqzc2P3hum0k=; b=JZYAwCNZ+oMhOvGJ00+xG9mNWY4YsKgG0XBJwS5uVveKhBG1wm4ju485+s3VxeY1Hh 5AUvfWE5ZpUpSLMiHXSV9xIZqr0K9EBQR25YQGIO6FQ36QnUBCaNDIapYPz0+mQ/nI23 TseDiLivNtNbuWxqflMkc3irQvibRikMu2TDI= Received: by 10.216.134.169 with SMTP id s41mr4367891wei.68.1318245067626; Mon, 10 Oct 2011 04:11:07 -0700 (PDT) Received: from [192.168.2.3] (80-42-192-122.dynamic.dsl.as9105.com. [80.42.192.122]) by mx.google.com with ESMTPS id es10sm32243626wbb.4.2011.10.10.04.11.02 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 10 Oct 2011 04:11:06 -0700 (PDT) Message-ID: <4E92D2C1.3080705@googlemail.com> Date: Mon, 10 Oct 2011 12:10:57 +0100 From: Paolo Castagna User-Agent: Thunderbird 2.0.0.24 (X11/20101027) MIME-Version: 1.0 To: jena-dev@incubator.apache.org Subject: Re: Promoting TxTDB to TDB/trunk References: <4E92ADCA.6060308@epimorphics.com> In-Reply-To: <4E92ADCA.6060308@epimorphics.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi Andy, first of all, thanks for sending this email. My comments are in-line. Andy Seaborne wrote: > Paolo, all, > > I think it's time to promote TxTDB to be the TDB trunk. +1 > > The criterion I have is whether TxTDB provides at least the same > functionality as TDB. That is, when running non-transactionally, is > TxTDB good enough to be the next TDB? > As far as I know, there are currently not open bugs, therefore: good enough. Performances for running it non-transactionally should not have changed significantly, we have a good excuse to give JenaPerf a try to find out. :-) Having (Tx)TDB released would help us (@ Talis) as well, since at the moment we need to roll out our internal releases of TxTDB (and it's a (small) cost). > There are some missing features for transactions: > > Documentation > Dataset level API [*] > > and a 1001 other things that could be done. > > Triage the JIRA list: > > JENA-133 provide configurability of cache sizes > Not critical for a release - adds something that isn't there now. Yep. > > JENA-131 TxTDB problem during concurrent execution > Insuffcient evidence currently. > Not critical for the switchover - does not break TDB in non-txn mode. Agree. > > JENA-117 A pure Java version of tdbloader2 > Not critical for a release - adds something that isn't there now. Yep. Not a blocker for a new release of TDB or a TxTDB <-> TDB switch over. > JENA-106 Merge joins in TDB > Not critical for a release - adds something that isn't there now. > Need performance framework to determine when/if it > makes a positive difference. Agree. > JENA-97 TDB 0.9.0 snapshot sometimes returns a SELECT binding twice > Awaiting confirmation. Test case does not illustrate the problem. > Not new to TxTDB. > A possible alternative reading of the report has been fixed. I am still unclear on this. Is it a bug? If it is a bug, it is present in the latest stable TDB release and therefore both TDB and TxTDB are affected. A TxTDB <-> TDB switch over will not make the situation worst respect to this. It would be good to get to the bottom of this though before the next (Tx)TDB release. > > so I propose doing a switch over by: > > svn mv \ > https://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/trunk > https://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/tags/TDB-0.8.X > > svn cp \ > https://svn.apache.org/repos/asf/incubator/jena/Experimental/TxTDB/trunk > https://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/trunk > > The only reason for the second being a "cp" (I strongly prefer not > leaving visible orphan copies around) is to have a temporary version > that marks the changeover. By diff'ing TDB/trunk against > Experimental/TxTDB/trunk, it would be possible to find items to backport > to TDB-0.8.X should that be necessary. I expect the copy to be around > for a short period of time only. +1 It's a good plan. Just ping me if you need any help on this or if there is something I can do. As I mentioned performances and JenaPerf above, I'd like to give it a go and compare TDB vs. TxTDB (with or without transactions). But, I am not 100% sure I will be able to do this by the end of this week. > > Whether "svn up" can cope I don't know - it may mean a clear checkup is > needed but that might be safer anyway. I'll certainly go for a clean checkout. Not a big deal. > > Then an important looking JIRA item for TDB is: > > JENA-102 tdbloader creates stats.opt file in existing DB > Not a blocker because it's problem with the current release. > It is well worth addressing stats.opt maintenance properly, > not just solving the point problem. +1 on "worth addressing stats.opt maintenance properly". A first step on this would be to add a comment on JENA-102 to clarify what "properly" would mean in practice, what's need to be done? Or, open a new JIRA issue for it and link it with JENA-102. If we end-up with an in-memory and on-disk solution which could (eventually) be used to answer specific SPARQL queries such as the ones I often see in the office and Dave mentioned recently: SELECT DISTINCT ?p WHERE {?s ?p ?o.} SELECT DISTINCT ?cls WEHRE {?i a ?cls.} That would be awesome. I am not proposing we do this in one shot, the use stats to answer the above SPARQL queries is a completely (and not necessary) step. However, keeping this in mind and come up with a solution which would make that possible would be good. Are you proposing we close JENA-102 and deal with stats.opt mainenance properly before the TxTDB <-> TDB switch over? Before or after does not make a big difference to me, so long we fix it. Paolo > > > Andy > > [*] As in "finish" and "decide which of two ... or both options"