incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Vesse (Commented) (JIRA)" <>
Subject [jira] [Commented] (JENA-178) SPARQL Results serialization and parsing is slow with large result sets
Date Tue, 13 Dec 2011 23:39:29 GMT


Rob Vesse commented on JENA-178:

So a couple more notes - for these figures the dataset was LUBM 0 with materialized forward
chained inferences giving approx 138,000 triples loaded into a TDB database.  The performance
problem I've seen is not dataset specific, I've seen similarily bad performance on a variety
of datasets when queries yield large results.  The Fuseki instance is running on the local
machine so the HTTP overhead should be minimal.

I looked into the code and I can't seen anything untoward for SPARQL XML at least, it looks
like results are serialized and parsed in a fully streaming fashion.  Certainly serialization
all appears to be done with string manipulation which should in theory at least be fast. 
I will try and create a minimal test case that just exercises the parser and the serializer
on local in-memory data and would allow easier profiling to see what may be going wrong.
> SPARQL Results serialization and parsing is slow with large result sets
> -----------------------------------------------------------------------
>                 Key: JENA-178
>                 URL:
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.8.9
>         Environment: Windows 7 Enterprise 64 bit
>            Reporter: Rob Vesse
> The SPARQL XML and JSON Result formats are very slow when the result set is large.  This
is surprising to me since both formats are relatively simple and should lend themselves to
fairly fast streaming serialization and parsing.
> The following are observed performance figures comparing SPARQL XML, SPARQL JSON and
SPARQL TSV results format.  This is the averaged time over 5 runs to retrieve the first 50,000
triples from the dataset with a simple SELECT * WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP
request to Fuseki and iterate over the results on the client.
> SPARQL XML = 15.25 seconds
> SPARQL JSON = 10.9 seconds
> SPARQL TSV = 0.54 seconds
> Now obviously TSV is way simpler to serialize and parse than XML/JSON but these serializers
and parsers should not be 20-30 times slower IMO
> Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p } WHERE { ?s ?p
?o } LIMIT 50000 takes only about 2s and that is using RDF/XML serialization which I would
have expected to be slower because RDF/XML is more complex to generate than either SPARQL
XML/JSON results.  I haven't dived into the code in detail to investigate why this is slow
yet but do the Jena team have any thoughts on this?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message