incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Waites <>
Subject Re: Error handling and HTTP 1.1 support in SERVICE client
Date Mon, 30 Jan 2012 21:37:50 GMT
Hello Andy,

So very good news on two fronts, firstly the SERVICE SILENT (I
mentioned in IRC to Paolo that something like SERVICE
RETRYWITHCAPPEDEXPONENTIALBACKOFF might be even better) and secondly I
can confirm with tcpdump(1) that for persistent connections, Paolo is
correct that it is doing the Right Thing.

For streaming the output, I suppose doing a SELECT would be possible
but that would mean parsing the SPARQL XML and rendering it as
RDF... Quickly becoming unweildy as the head of the CONSTRUCT becomes
more complicated - the example I gave was, as I mentioned,
abbreviated, in reality there are a few statements in the head, and a
more complicated WHERE clause with a bunch of OPTIONALs.

I wonder also if there mightn't be a workable way of (1) imposing some
order on the results of a SERVICE clause and (2) walking the results
with LIMIT and OFFSET so that we don't need to read one of the result
sets entirely into RAM before iterating over the other.  This might be
a bit tricky to get right, and might not work in all cases.

This is related your point about BINDINGS, which could, if we're not
careful, make for enormous queries *and* enormous result sets if we
don't have some mechanism for paging through them under the hood. Even
if it isn't completely ungrounded query, it can still be unreasonably

One last, minor nit:

    Dataset not specified in query nor provided on command line

So in the case where *all* of the data source is remote, I can work
around this with a null RDF/XML file, but really shouldn't this just
be the default? Don't give it any data, and it has no data?

If the streaming output can be worked out, and, ideally, the paging,
this becomes a pretty useful tool for combining data from SPARQL
endpoints, that has a relatively small-ish memory footprint (~50Mb
perhaps?), and can produce output that can then be incrementally fed
into a store where users can observe something happening. Looking
quite promising to me at this stage!


P.S. I am unlikely to be contributing any patches in the near future,
more likely to treat it as a black box and trust Paolo to know how the
innards work. If I get some spare cycles I might hack on it through
clojure though... I have a vague feeling that lazy evaluation is
useful here somewhere...

View raw message