incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: Error handling and HTTP 1.1 support in SERVICE client
Date Mon, 30 Jan 2012 15:08:46 GMT
On 30/01/12 12:52, William Waites wrote:
> Hello all,

Hi William,

> My collegue Paolo has been suggesting that I join this list for a
> while, and since I have a couple of questions stemming from my use of
> the SPARQL-FED stuff in ARQ, I thought that now might be a good time.
> What I'm doing is as follows. I have some information about
> airports. It's accurate and complete, but pretty skeletal. dbpedia on
> the other hand is less complete but richer in terms of text
> descriptions and additional information. There also happens to be a
> text field (ICAO code) that can be used to join the two.
> Though I know there are ways to do this more efficiently, I think a
> single CONSTRUCT query with some SERVICE blocks in the WHERE clause is
> a very clean way to do it, and will only become more efficient as the
> implementation gets better.
> So an abbreviated version of the query might be something like,
>      CONSTRUCT {
>          ?my_uri dct:description ?description
>      } WHERE {
>          ?my_uri transit:icaoCode ?icao.
>          SERVICE<>  {
>              ?dbp_uri dbpprop:icao ?icao;
>                       rdfs:comment ?description
>          }
>      }
> This mail is about two ways the implementation might get
> better.
> Firstly it is brittle. It expands into doing one remote query for each
> ?icao, which is what one would expect. If any sub-query fails due to
> transient network events or server flakiness (almost inevitable with
> more than a trivially small set of things to be queried) the whole
> query fails. I would rather like the process to continue, and perhaps
> log a warning. The web is unreliable and the semantic web contains a
> funny open-world assumption of incomplete results being acceptable,
> it's just the nature of the beast. Incomplete results are better than
> no results in this case, but that they are known to be possibly
> incomplete should be flagged in some way in case the user cares.

SERVICE SILENT may be what you are looking for.  Strictly, this is 
continue (with no results) if any part fails but in ARQ, in normal 
usage, it is applied to each service request.

See QueryIterService.

> Secondly, I understand from Paolo that the client in ARQ does not use
> persistent HTTP connections. For iterations like this, the HTTP
> set-up/tear-down is quite costly and it would be much better if
> persistent connections were supported here. Possibly even better
> (potentially the server could take advantage of this, executing
> queries in parallel for example) if the queries were pipelined to some
> extent.

The real problem is that the correct query to send to the far end is

       ?dbp_uri dbpprop:icao ?icao;
                rdfs:comment ?description
  } BINDINGS ... fro the first part ...

then it is one request that still does not ask an ungrounded
    ?dbp_uri dbpprop:icao ?icao;
             rdfs:comment ?description

but DBpedia does not support all of SPARQL 1.1 and in particular it does 
not support BINDINGS (yet?).

The implementation of service requests is in 
com.hp.hpl.jena.sparql.engine.http.HttpQuery.  It might be better to use 
the Apache HTTP client.  Currently it use

Patches welcome.

> doesn't cause the whole thing to fail and lose the work already done.
> "doesn't consume a lot of RAM"

ARQ streams the results out (unless you ask something that can't like 
wanting the text output form - in which case send to a file as a 
streamable format and read the file back in.) .

CONSTRUCT isn't streamable - can you use a SELECT and generate the 
triples for the CONSTRUCT as it streams?


> Cheers,
> -w

View raw message