incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (Created) (JIRA)" <>
Subject [jira] [Created] (JENA-225) TDB datasets can be corrupted by performing certain operations within a transaction
Date Wed, 21 Mar 2012 17:59:40 GMT
TDB datasets can be corrupted by performing certain operations within a transaction 

                 Key: JENA-225
             Project: Apache Jena
          Issue Type: Bug
    Affects Versions: TDB 0.9.0
         Environment: jena-tdb-0.9.0-incubating
            Reporter: Sam Tunnicliffe

In a web application, we read some triples in a HTTP POST, using a LangTurtle instance and
a tokenizer obtained from from TokenizerFactory.makeTokenizerUTF8. 
We then write the parsed Triples back out (to temporary storage) using OutputLangUtils.write.
At some later time, these Triples are then re-read, again using a Tokenizer from TokenizerFactory.makeTokenizerUTF8,
before being inserted into a TDB dataset. 
We have found it possible for the the input data to contain character strings which pass through
the various parsers/serializers but which cause TDB's transaction layer to error in such a
way as to make recovery from journals ineffective. 

Eliminating transactions from the code path enables the database to be updated successfully.

The stacktrace from TDB looks like this: 
org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello 
	at org.openjena.riot.tokens.TokenizerText.exception(
	at org.openjena.riot.tokens.TokenizerText.readString(
	at org.openjena.riot.tokens.TokenizerText.parseToken(
	at org.openjena.riot.tokens.TokenizerText.hasNext(
	at com.hp.hpl.jena.tdb.nodetable.NodecSSE.decode(
	at com.hp.hpl.jena.tdb.lib.NodeLib.decode(
	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(
	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(
	at org.openjena.atlas.iterator.Iter$
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(
	at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(
	at com.hp.hpl.jena.tdb.transaction.Transaction.commit(
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(
	at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(
	at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(

At least part of the issue seems to be stem from NodecSSE (I know this isn't actual unicode
escaping, but its derived from the user input we've received). 

String s = "Hello \uDAE0 World";
Node literal = Node.createLiteral(s);
ByteBuffer bb = NodeLib.encode(literal);

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message