incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Vesse <>
Subject Re: GZip support in Fuseki
Date Tue, 07 Feb 2012 01:05:28 GMT
Comments inline:

On Feb 6, 2012, at 2:06 PM, Andy Seaborne wrote:

> On 06/02/12 17:21, Robert Vesse wrote:
>> On a local network it does not seem to be beneficial, it adds a few
>> seconds of overhead.
> I was trying gzip (in Java) today for Fuseki backups - I found similarly that writing
to disk was faster without gzip.
> The Java gzip was about "gzip -6" in time and space reduction achived with large enough
gzip buffer - I used 8K where the default is 0.5K.

So in my testing I ran a query that dumped approx 660k thousand triples out of the endpoint
and compared gzip'd vs non-gzip'd times averaged over several runs.  For reference the Jetty
GzipFilter uses an 8K buffer size

With the server and client both on the local machine gzip added around 5s of overhead for
TSV and 7s overhead for XML

But when running the server on a remote machine (outside of the LAN) gzip gave a 2x speed
up for TSV and a 5x speed up for XML

Note I tried to get figures for JSON as well but discovered that the SPARQL JSON parser appears
to be non-streaming as I hit OOM exceptions so I'll look into that and maybe file a separate
JIRA for that.

>> Due to other more pressing work I haven't had
>> chance to test for the overheads when used over a decidedly non-local
>> network though I should get chance to do that later today at which
>> point I'll submit the patch.
>> The Fuseki patch I have at the moment enables the filter by default
>> but due to the way Jetty works the filter only gets applied if the
>> client explicitly states that they accept GZip encoded content with
>> the Accept-Encoding parameter.
> I think that's correct generally, not a Jetty-ism.  If the client does not ask for it,
it should not happen (the client may not have the ability or desire to uncompress e.g scripting
languages, curl etc).

Well yes, I just meant in terms of how their filter functions.  As a general rule any HTTP
server should not give back something the client hasn't asked for.

>> Browsers typically send this so it may be that it would be best to
>> have this feature off by default because if people do prototyping and
>> testing on their local machine with their browser they may see slower
>> performance because of this.
> Agreed. Off by default.

Ok, I will make it off by default, do you want it enabled by command line parameter or only
programmatically.  My preference would be to add a config symbol which the code uses to determine
if the feature should be enabled and have the command line parameter enable that symbol if

>> However most SPARQL clients in various
>> APIs probably won't include this header by default - whether this
>> wants to be enabled by default will probably depend on the
>> performance figures. > I'll aim to have the submitted patch make the
>> behavior configurable and leave it up to you whether you want to have
>> it enabled/disabled by default once you've seen the figures.
>> Rob
>> ps. I also have a related patch for ARQ which allows it to ask for
>> GZip and Deflate encoded content though I'll likely package that as
>> part of a more extensive patch for QueryEngineHttp I've been working
>> on which also adds in support for configuring requested content
>> type.
> ARQ isn't being released - just Fuseki so this is less time critical.
> 	Andy

View raw message