incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: Error handling and HTTP 1.1 support in SERVICE client
Date Mon, 30 Jan 2012 22:26:12 GMT
On 30/01/12 21:37, William Waites wrote:
> Hello Andy,
> So very good news on two fronts, firstly the SERVICE SILENT (I
> mentioned in IRC to Paolo that something like SERVICE

Probably overflows the tokenizer buffer :-)

> and secondly I
> can confirm with tcpdump(1) that for persistent connections, Paolo is
> correct that it is doing the Right Thing.
> For streaming the output, I suppose doing a SELECT would be possible
> but that would mean parsing the SPARQL XML and rendering it as
> RDF... Quickly becoming unweildy as the head of the CONSTRUCT becomes
> more complicated - the example I gave was, as I mentioned,
> abbreviated, in reality there are a few statements in the head, and a
> more complicated WHERE clause with a bunch of OPTIONALs.

Not really - the code that does this currently is 13 lines - some of 
them blank.  See QueryExecutionBase.execConstruct(Model) although the 
nearly the same code used by UpdateEngineWorker in TemplateLib.calcQuads 
is better.

> I wonder also if there mightn't be a workable way of (1) imposing some
> order on the results of a SERVICE clause and (2) walking the results
> with LIMIT and OFFSET so that we don't need to read one of the result
> sets entirely into RAM before iterating over the other.  This might be
> a bit tricky to get right, and might not work in all cases.
> This is related your point about BINDINGS, which could, if we're not
> careful, make for enormous queries *and* enormous result sets if we
> don't have some mechanism for paging through them under the hood. Even
> if it isn't completely ungrounded query, it can still be unreasonably
> big.
> One last, minor nit:
>      Dataset not specified in query nor provided on command line
> So in the case where *all* of the data source is remote, I can work
> around this with a null RDF/XML file, but really shouldn't this just
> be the default? Don't give it any data, and it has no data?

Yes and no.  If its the default, we get user emails saying "no query has 
no output".  -data D.nt

(What's RDF/XML by the way? Is that some old format for RDF? :-)

> If the streaming output can be worked out, and, ideally, the paging,
> this becomes a pretty useful tool for combining data from SPARQL
> endpoints, that has a relatively small-ish memory footprint (~50Mb
> perhaps?), and can produce output that can then be incrementally fed
> into a store where users can observe something happening. Looking
> quite promising to me at this stage!

Tarnsactions and TDB wil help you there - add batches from an iterator. 
  (Don't add them all in one transaction.)

> Thanks,
> -w
> P.S. I am unlikely to be contributing any patches in the near future,
> more likely to treat it as a black box and trust Paolo to know how the
> innards work. If I get some spare cycles I might hack on it through
> clojure though... I have a vague feeling that lazy evaluation is
> useful here somewhere...

As Paolo knows, feedback emails are good, contributions of caode, 
including test cases (and any documentation) are the life blood of an 
open source project.  Its easy to get requests faster than they will 
ever get attended to.

This also goes for email on jena-users@ - please do contribute answers 
there.  Every little helps.


View raw message