incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: Output checking in ARQ
Date Tue, 31 Jan 2012 17:05:04 GMT
Hi William,

 > Following through the same usage with the federated queries. Sometimes
 > we get rubbish back. Things like<http://54233.1*B>  come out of
 > dbpedia. ARQ faithfully takes these, binds them to results and outputs
 > them.

Whether checking is the right thing to do depends on the application 
usage.  Some might want to see the bad data (e.g. to fix it, or to 
complain); in your case, you want it suppressed.

And one persons error is another persons useful data.  Encoding errors 
or illformed literals are common.

DBpedia has a lot of junk in it and it's a somewhat difficult service to 
work with in a process.  (It is also an appreciable support cost to 
Jena.)  I don't want to put in workarounds for DBpedia if DBpedia should 
fix the data - from your POV it would be nice if the client code fixed 
the problems of the remote end ... but ARQ is a general library.

I have worked with those guys to fix problems at source and last time I 
checked, the data was at least legal inc legal URIs.  If you are 
accessing a recent version, maybe reporting it might get it fixed.

> Of course the problem then comes when you try to take the results and
> feed them into the very strict Jena parsers, and end up, in our setup,
> with entire batches of statements rejected when we try to put it into
> stable storage.

Which parser?  Some are configurable.

> Suggest making the output routines of arq.query check to make sure the
> terms are valid, and in the case of CONSTRUCT and DESCRIBE, additional
> checks that make sure we don't have things like literals in the
> predicate position and suchlike, with the aim of guaranteeing that you
> can always insert the results of a CONSTRUCT into a Jena/TDB store.

Graphs should never end up with illegal triples - this is a spec matter. 
  Could you provide a complete, minimal example please so it can be 
fixed.  The code should simply drop illformed literals - it is in the spec.


> Cheers,
> -w

View raw message