incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Re: Promoting TxTDB to TDB/trunk
Date Mon, 10 Oct 2011 11:10:57 GMT
Hi Andy,
first of all, thanks for sending this email.

My comments are in-line.

Andy Seaborne wrote:
> Paolo, all,
> I think it's time to promote TxTDB to be the TDB trunk.


> The criterion I have is whether TxTDB provides at least the same
> functionality as TDB.  That is, when running non-transactionally, is
> TxTDB good enough to be the next TDB?

As far as I know, there are currently not open bugs, therefore: good enough.

Performances for running it non-transactionally should not have changed
significantly, we have a good excuse to give JenaPerf a try to find out. :-)

Having (Tx)TDB released would help us (@ Talis) as well, since at the moment
we need to roll out our internal releases of TxTDB (and it's a (small) cost).

> There are some missing features for transactions:
>   Documentation
>   Dataset level API [*]
> and a 1001 other things that could be done.
> Triage the JIRA list:
> JENA-133     provide configurability of cache sizes
>   Not critical for a release - adds something that isn't there now.


> JENA-131     TxTDB problem during concurrent execution
>   Insuffcient evidence currently.
>   Not critical for the switchover - does not break TDB in non-txn mode.


> JENA-117     A pure Java version of tdbloader2
>   Not critical for a release - adds something that isn't there now.

Yep. Not a blocker for a new release of TDB or a TxTDB <-> TDB switch over.

> JENA-106     Merge joins in TDB
>   Not critical for a release - adds something that isn't there now.
>   Need performance framework to determine when/if it
>   makes a positive difference.


> JENA-97     TDB 0.9.0 snapshot sometimes returns a SELECT binding twice
>   Awaiting confirmation.  Test case does not illustrate the problem.
>   Not new to TxTDB.
>   A possible alternative reading of the report has been fixed.

I am still unclear on this. Is it a bug?

If it is a bug, it is present in the latest stable TDB release and therefore
both TDB and TxTDB are affected. A TxTDB <-> TDB switch over will not make
the situation worst respect to this.

It would be good to get to the bottom of this though before the next (Tx)TDB

> so I propose doing a switch over by:
> svn mv \
> svn cp \
> The only reason for the second being a "cp" (I strongly prefer not
> leaving visible orphan copies around) is to have a temporary version
> that marks the changeover.  By diff'ing TDB/trunk against
> Experimental/TxTDB/trunk, it would be possible to find items to backport
> to TDB-0.8.X should that be necessary.  I expect the copy to be around
> for a short period of time only.


It's a good plan.

Just ping me if you need any help on this or if there is something I can do.

As I mentioned performances and JenaPerf above, I'd like to give it a go and
compare TDB vs. TxTDB (with or without transactions). But, I am not 100% sure
I will be able to do this by the end of this week.

> Whether "svn up" can cope I don't know - it may mean a clear checkup is
> needed but that might be safer anyway.

I'll certainly go for a clean checkout. Not a big deal.

> Then an important looking JIRA item for TDB is:
> JENA-102     tdbloader creates stats.opt file in existing DB
>   Not a blocker because it's problem with the current release.
>   It is well worth addressing stats.opt maintenance properly,
>   not just solving the point problem.

+1 on "worth addressing stats.opt maintenance properly".

A first step on this would be to add a comment on JENA-102 to clarify what
"properly" would mean in practice, what's need to be done? Or, open a new
JIRA issue for it and link it with JENA-102.

If we end-up with an in-memory and on-disk solution which could (eventually)
be used to answer specific SPARQL queries such as the ones I often see in the
office and Dave mentioned recently:

  SELECT DISTINCT ?p WHERE {?s ?p ?o.}
  SELECT DISTINCT ?cls WEHRE {?i a ?cls.}

That would be awesome.

I am not proposing we do this in one shot, the use stats to answer the above
SPARQL queries is a completely (and not necessary) step. However, keeping this
in mind and come up with a solution which would make that possible would be

Are you proposing we close JENA-102 and deal with stats.opt mainenance properly
before the TxTDB <-> TDB switch over?

Before or after does not make a big difference to me, so long we fix it.


>     Andy
> [*] As in "finish" and "decide which of two ... or both options"

View raw message