Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B071F9DE2 for ; Mon, 26 Dec 2011 22:14:52 +0000 (UTC) Received: (qmail 75280 invoked by uid 500); 26 Dec 2011 22:14:52 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 75247 invoked by uid 500); 26 Dec 2011 22:14:52 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 75239 invoked by uid 99); 26 Dec 2011 22:14:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2011 22:14:52 +0000 X-ASF-Spam-Status: No, hits=-2001.3 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2011 22:14:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C94DA12A54D for ; Mon, 26 Dec 2011 22:14:30 +0000 (UTC) Date: Mon, 26 Dec 2011 22:14:30 +0000 (UTC) From: "Stephen Allen (Commented) (JIRA)" To: jena-dev@incubator.apache.org Message-ID: <54395478.45646.1324937670825.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1884146696.8055.1323818010349.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (JENA-178) SPARQL Results serialization is slow for some formats with large result sets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/JENA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176034#comment-13176034 ] Stephen Allen commented on JENA-178: ------------------------------------ I shaved ~100ms off of the output by eliminating string concatenation in XMLOutputResultSet. Still lagging behind SAX and StAX though. Committed in revision 1224827. XML Before: 6829 XML After: 6733 TSV: 801 JSON: 6581 XML SAX: 2057 XML StAX: 1371 > SPARQL Results serialization is slow for some formats with large result sets > ---------------------------------------------------------------------------- > > Key: JENA-178 > URL: https://issues.apache.org/jira/browse/JENA-178 > Project: Jena > Issue Type: Bug > Components: ARQ > Affects Versions: ARQ 2.9.0 > Environment: Windows 7 Enterprise 64 bit > Reporter: Rob Vesse > Assignee: Damian Steer > Fix For: ARQ 2.9.1 > > Attachments: Jena178.java, Jena178.patch, TestArqSerializerPerformance.java, XMLOutputSAX.java, XMLOutputStAX.java > > > The SPARQL XML and JSON Result formats are very slow when the result set is large. This is surprising to me since both formats are relatively simple and should lend themselves to fairly fast streaming serialization and parsing. > The following are observed performance figures comparing SPARQL XML, SPARQL JSON and SPARQL TSV results format. This is the averaged time over 5 runs to retrieve the first 50,000 triples from the dataset with a simple SELECT * WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP request to Fuseki and iterate over the results on the client. > SPARQL XML = 15.25 seconds > SPARQL JSON = 10.9 seconds > SPARQL TSV = 0.54 seconds > Now obviously TSV is way simpler to serialize and parse than XML/JSON but these serializers and parsers should not be 20-30 times slower IMO > Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p } WHERE { ?s ?p ?o } LIMIT 50000 takes only about 2s and that is using RDF/XML serialization which I would have expected to be slower because RDF/XML is more complex to generate than either SPARQL XML/JSON results. I haven't dived into the code in detail to investigate why this is slow yet but do the Jena team have any thoughts on this? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira