incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Commented) (JIRA)" <>
Subject [jira] [Commented] (JENA-225) TDB datasets can be corrupted by performing certain operations within a transaction
Date Thu, 22 Mar 2012 18:10:22 GMT


Hudson commented on JENA-225:

Integrated in Jena_ARQ #510 (See [])
    Partial fix for JENA-225.
This does not fix the problem completely for TDB because strings are 9still) not round-trip-safe.
(Revision 1303934)

     Result = SUCCESS
andy : 
Files : 
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/atlas/lib/

> TDB datasets can be corrupted by performing certain operations within a transaction 
> ------------------------------------------------------------------------------------
>                 Key: JENA-225
>                 URL:
>             Project: Apache Jena
>          Issue Type: Bug
>    Affects Versions: TDB 0.9.0
>         Environment: jena-tdb-0.9.0-incubating
>            Reporter: Sam Tunnicliffe
>         Attachments: JENA-225-v1.patch,
> In a web application, we read some triples in a HTTP POST, using a LangTurtle instance
and a tokenizer obtained from from TokenizerFactory.makeTokenizerUTF8. 
> We then write the parsed Triples back out (to temporary storage) using OutputLangUtils.write.
At some later time, these Triples are then re-read, again using a Tokenizer from TokenizerFactory.makeTokenizerUTF8,
before being inserted into a TDB dataset. 
> We have found it possible for the the input data to contain character strings which pass
through the various parsers/serializers but which cause TDB's transaction layer to error in
such a way as to make recovery from journals ineffective. 
> Eliminating transactions from the code path enables the database to be updated successfully.
> The stacktrace from TDB looks like this: 
> org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello 
> 	at org.openjena.riot.tokens.TokenizerText.exception(
> 	at org.openjena.riot.tokens.TokenizerText.readString(
> 	at org.openjena.riot.tokens.TokenizerText.parseToken(
> 	at org.openjena.riot.tokens.TokenizerText.hasNext(
> 	at com.hp.hpl.jena.tdb.nodetable.NodecSSE.decode(
> 	at com.hp.hpl.jena.tdb.lib.NodeLib.decode(
> 	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(
> 	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(
> 	at org.openjena.atlas.iterator.Iter$
> 	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(
> 	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(
> 	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(
> 	at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(
> 	at com.hp.hpl.jena.tdb.transaction.Transaction.commit(
> 	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(
> 	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(
> 	at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(
> 	at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(
> At least part of the issue seems to be stem from NodecSSE (I know this isn't actual unicode
escaping, but its derived from the user input we've received). 
> String s = "Hello \uDAE0 World";
> Node literal = Node.createLiteral(s);
> ByteBuffer bb = NodeLib.encode(literal);
> NodeLib.decode(bb);

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message