incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen (JIRA)" <>
Subject [jira] [Commented] (JENA-41) Different policy for concurrency access in TDB supporting a single writer and multiple readers
Date Mon, 21 Mar 2011 22:09:05 GMT


Stephen Allen commented on JENA-41:

I think your idea about the DatasetGraph being the interface for transactions makes sense.
 Transactional DatasetGraphs could also provide fallback behavior for legacy code by implementing
autocommit transactions if the user called methods on a dataset that was not initialized in
a transactionBegin() call.

With regard to the isolation levels, I believe some of the lower levels can make sense for
particular applications or queries.  For example say you want to know the size of a few of

select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;

Assuming a traditional pessimistic locking scheme, running the transaction at SERIALIZABLE
could cause the locks held by the first select query to also be held through the second query,
reducing concurrency (using two transactions instead might not be a good idea as there is
usually some amount of overhead associated with creating and committing transactions).

If you were OK with the possibility that the two query results are not truly serializable
with respect to each other, then you could improve concurrency by using a READ_COMMITTED isolation
level instead that would give serializable results for each query (but not the whole transaction).
 And if you really just needed a rough estimate of size, using READ_UNCOMMITTED may be able
to avoid locking all together.

An additional motivating factor for MVCC implementations is that they may be implementing
snapshot isolation, which probably maps better to READ_COMMITTED than SERIALIZABLE (especially
if it could do predicate locking for true serializable behavior but allow cheaper snapshot
isolation if that was all that was needed).  The Postgres documentation does a good job of
describing this [1].

I would find it useful to have multiple isolation levels available (even if internally I'm
mapping them all to SERIALIZABLE at first).  The four ANSI Isolation levels seem appropriate,
and remember that implementations are allowed to map unavailable lower levels to higher levels
as desired.


> Different policy for concurrency access in TDB supporting a single writer and multiple
> ----------------------------------------------------------------------------------------------
>                 Key: JENA-41
>                 URL:
>             Project: Jena
>          Issue Type: New Feature
>          Components: Fuseki, TDB
>            Reporter: Paolo Castagna
>         Attachments:,,,,,
> As a follow up to a discussion about "Concurrent updates in TDB" [1] on the jena-users
mailing list, I am creating this as a new feature request.
> Currently TDB requires developers to use a Multiple Reader or Single Writer (MRSW) locking
policy for concurrency access [2]. Not doing this could cause data corruptions.
> The MRSW is indeed a MR xor SW (i.e. while a writer has a lock, no readers are allowed
and, similarly, if a reader has a lock, no writes are possible).
> This works fine in most of the situation, but there might be problems in presence of
long writes or long reads.
> It has been suggested that a "journaled file access" could be used to solve the issue
regarding a long write blocking reads.
>  [1]
>  [2]

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message