incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Re: Error handling and HTTP 1.1 support in SERVICE client
Date Mon, 30 Jan 2012 15:22:20 GMT
William Waites wrote:
> On Mon, 30 Jan 2012 14:26:00 +0000, Paolo Castagna <>
>     paolo> Welcome William.
> Thank you.
>     paolo> When possible, I do this sort of things locally. I get a
>     paolo> copy of the data I need or small slices of it, I load
>     paolo> everything in TDB and run my SPARQL queries locally.
> Right. However for my applications (!!!) I don't want to do this
> because:
>   1. I cannot count on the remote data being available in bulk since
>      some publishers habitually only make SPARQL endpoints available
>      and not dumps.


Good point, it's not always possible to get a dump of a remote dataset.

>   2. I don't know beforehand which slices of the data I will need,
>      if I knew this I wouldn't need to run the query.
>   3. I cannot count on having my own temporary local store to put
>      intermediate results into.
>     paolo> Looking at [1] that seems to me to be the
>     paolo> case (and it is probably ok for the majority of use cases).
> Perhaps. Although fixing this would improve performance by a
> significant amount and should not break anything existing. And it
> ought to be simple.

Actually, I think I was wrong. JDK supports persistent connections

 "The JDK supports both HTTP/1.1 and HTTP/1.0 persistent connections.
  When the application finishes reading the response body or when the
  application calls close() on the InputStream returned by
  URLConnection.getInputStream(), the JDK's HTTP protocol handler will
  try to clean up the connection and if successful, put the connection
  into a connection cache for reuse by future HTTP requests. The support
  for HTTP keep-Alive is done transparently."

I also verified the correct "Connection: keep-alive" is there when a
request is sent, for example:

Accept: application/sparql-results+xml
User-Agent: Java/1.6.0_16
Connection: keep-alive

So, it is fine.

Current ARQ's SERVICE implementation supports keep-alive (I was wrong!)
(It's good to be wrong when there isn't nothing to implement ;-)).

> I am perfectly happy for the query to take a long time to run as a
> batch job as long as it doesn't consume a lot of RAM and that a
> recoverable failure (e.g. of the HTTP response code 5XX kind not 4XX)
> doesn't cause the whole thing to fail and lose the work already done.
> "doesn't consume a lot of RAM" probably means "write results to
> persistant storage or a file descriptor incrementally". That would
> make Jena/ARQ useable in my application.

A query engine needs to deliver correct results.

Perhaps, you can separate the processing in two parts:

 1. Retrieve (incrementally and supporting failures) the data you need
 2. Run further processing locally.

1. is the part which can recover from failures.
2. is less likely to fail when working with local data.

My 2 cents,

View raw message