Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 089F27061 for ; Tue, 27 Dec 2011 11:50:56 +0000 (UTC) Received: (qmail 4379 invoked by uid 500); 27 Dec 2011 11:50:55 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 4340 invoked by uid 500); 27 Dec 2011 11:50:55 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 4332 invoked by uid 99); 27 Dec 2011 11:50:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 11:50:55 +0000 X-ASF-Spam-Status: No, hits=-2001.3 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 11:50:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B7B7212B8EE for ; Tue, 27 Dec 2011 11:50:30 +0000 (UTC) Date: Tue, 27 Dec 2011 11:50:30 +0000 (UTC) From: "Andy Seaborne (Commented) (JIRA)" To: jena-dev@incubator.apache.org Message-ID: <388960034.46303.1324986630753.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1884146696.8055.1323818010349.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (JENA-178) SPARQL Results serialization is slow for some formats with large result sets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JENA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176151#comment-13176151 ] Andy Seaborne commented on JENA-178: ------------------------------------ I changed all Stacks<> in ARQ to Deque<>. This was a Java6 change due that I'd not got round to doing. This changed XML from about 7000 to 5000 (not completely sound test though - the machine was long running TDB tests at the time so the absolute number isn't meaningful). The test program Jena178 does not allow for JITting. Attached is Jena178_ResultSetSpeed. It makes more difference to the first-run tests presumably because low-level code is being worked out and it benefits later code. The JSON looks slow and IndentedWriter is a common component. It does do per-character poking. Making it final did not help. Checking for newlines in printed strings makes some difference. BufferingWriter looks suspicious even after the encoder improvements after looking with visualvm. It should buffer in string-space, not in byte-space. We could swap to StAX. Any reason why not? Current figures: https://svn.apache.org/repos/asf/incubator/jena/Scratch/AFS/Jena-Dev/trunk/src/dev/Jena178_ResultSetSpeed.java -- checking for newlines inside strings XMLSAX out took: 273 XMLStax out took: 272 XML out took: 1940 XML in took: 397 TSV out took: 128 TSV in took: 330 JSON out took: 1782 JSON in took: 746 -- not checking for newlines inside strings. XMLSAX out took: 326 XMLStax out took: 274 XML out took: 840 XML in took: 452 TSV out took: 130 TSV in took: 306 JSON out took: 908 JSON in took: 761 JVM: OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) > SPARQL Results serialization is slow for some formats with large result sets > ---------------------------------------------------------------------------- > > Key: JENA-178 > URL: https://issues.apache.org/jira/browse/JENA-178 > Project: Jena > Issue Type: Bug > Components: ARQ > Affects Versions: ARQ 2.9.0 > Environment: Windows 7 Enterprise 64 bit > Reporter: Rob Vesse > Assignee: Damian Steer > Fix For: ARQ 2.9.1 > > Attachments: Jena178.java, Jena178.patch, TestArqSerializerPerformance.java, XMLOutputSAX.java, XMLOutputStAX.java > > > The SPARQL XML and JSON Result formats are very slow when the result set is large. This is surprising to me since both formats are relatively simple and should lend themselves to fairly fast streaming serialization and parsing. > The following are observed performance figures comparing SPARQL XML, SPARQL JSON and SPARQL TSV results format. This is the averaged time over 5 runs to retrieve the first 50,000 triples from the dataset with a simple SELECT * WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP request to Fuseki and iterate over the results on the client. > SPARQL XML = 15.25 seconds > SPARQL JSON = 10.9 seconds > SPARQL TSV = 0.54 seconds > Now obviously TSV is way simpler to serialize and parse than XML/JSON but these serializers and parsers should not be 20-30 times slower IMO > Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p } WHERE { ?s ?p ?o } LIMIT 50000 takes only about 2s and that is using RDF/XML serialization which I would have expected to be slower because RDF/XML is more complex to generate than either SPARQL XML/JSON results. I haven't dived into the code in detail to investigate why this is slow yet but do the Jena team have any thoughts on this? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira