incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: Carrying raw query strings (public API change).
Date Fri, 13 Apr 2012 17:25:05 GMT
On 13/04/12 17:00, Robert Vesse wrote:
> We work at the level of QueryEngine but we have multiple
> implementations as depending on the query either the entire thing can
> be handled by the backend or only parts of it can so we override
> eval() in our query engine implementations.
> Then we either use a OpExecutor or just parcel the query off
> wholesale to our backend.  So I was slightly inaccurate in that we no
> longer use StageGenerator (though we did at one point)
> Regardless we are still at a level of the API where we don't see the
> QueryExecution so we couldn't utilize the context even if we wanted
> to

The context get everywhere.  QueryEngineBase has it as does 

The context is the merge of the global and the dataset specific context 
then add in user settings.  It will be available from ExecutionContext 
when it gets to the eval code.


or I hope so - it's how the dataset and active graph get passed around 
to actually deliver data!

It's even available in custom functions - FunctionEnv is the interface 
it exposes but it is really the ExecutionContext object.

And it does the iterator tracking (have you met tracking yet? :-)

> The more I think about this the less I think it actually solves our
> problem (it being carrying raw query strings) because we are still
> left with the issue that a query may turn into multiple queries
> internally and our developers wanted the query string to associate
> with each of those internal queries but that isn't a 1:1
> relationship
> Maybe it is for the best if I just go ahead and revert those
> changes?

OK, no rush.  ... and maybe open a JIRA if you think there is an 
architectural point here.  Given your last comment (splitting queries) 
maybe there isn't, or isn't at the moment.

Stephen's API experiment may have something to say here.


> Rob
> On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:
>> Rob -
>> On 12/04/12 17:43, Robert Vesse wrote:
>>> The notion of jobs makes sense to me but it implies some
>>> refactoring of our APIs are is simply not feasible in our current
>>> setup where we use Fuseki this is not doable because we are
>>> extending Fuseki indirectly by hooking into ARQs
>>> QueryExecutionFactory mechanism and so don't have any means to
>>> create this Job thing prior to starting to see the actual query
>>> in our ARQ integration layer.
>>> Even in a hypothetical situation where we did have such
>>> capability we still run into the issue that at some point the
>>> query has to drop into the ARQ machinery to be processed at which
>>> point it has to be a query and we'd lose any visibility back to
>>> our Job notion anyway. This is especially true since the point at
>>> which we actually send work off to our backend for processing is
>>> potentially very low level in the ARQ API (as far down as the
>>> Stage Generator layer)
>> This makes me a bit nervous; the needs of Cray to tunnel info from
>> one place to another because of current code structure balanced
>> against a long term change to the public API.
>> The good news is there is a better way in Fuseki.
>> The QueryExecution object is a one-time-use object and it has
>> somewhere to put such additional information - getContext().  This
>> is where the current time for the query goes for example.  It even
>> gets to the StageGenerator.  It's already got the query as an
>> object.
>> The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the
>> nearest I can see the "Job" from the point of view of the web
>> request.
>> So we can have the QueryExecution carry a per-operation label.
>> Change:
>> 1/ Add a new symbol: ARQ.queryLabel
>> 2/ SPARQL_Query.executeQuery creates the QueryExecution and can set
>> the context with a key/value that is the query string as
>> ARQ.queryLabel.  It knowns the queryStringLog -- it can take the
>> original query string as well, or we can put it in the
>> HttpActionQuery and put in the execution context.
>> (Aside: I thought you'd be using OpExecutor so as to access the
>> filters and LeftJoins as -- different discussion though ... though
>> I'd like to remove StageGenerator because there are too many ways
>> to do very similar things makign it messier to add new storage
>> layers .., so compatibility issue noted!)
>>> I don't think having the raw query string breaks the Java
>>> equality/hash code contract since the Query class is a
>>> structural representation of a query, preserving the original
>>> query string is just a convenience to users and doesn't change
>>> the fact that the class is a structural representation of a query
>>> and by definition different query strings can resolve to the same
>>> definition (white space, comments, prefix ordering etc.)
>> In your use case, sure, the string is not particularly
>> significant.
>> The contract for .equals in java is that two objects to be equal
>> they must be substitutable for one another.  Jena is a general
>> library - some app may rely on the query label for display
>> purposes, or as a key into another data structure.  That's the
>> long-term promise being made and it's hard to predicate what some
>> app may do - hence my desire for a strict adherence to the .equals
>> contract.
>> Also, by preserving the query string and comments, there is a
>> slippery slope to putting stuff in comments and relying on it.
>> Preserving the query string is convenience in your use case but if
>> some other use case is relying on the label for something, it is no
>> longer ancillary.
>> This shows a difference - a bit artificial but it's also supposed
>> to be small example -- image the two "put" operations being in very
>> different parts of the code:
>> try changing the order of the two .put -- I expected the different
>> output when it was put(q1,..), put(q2,...) but the way round below.
>> We live and learn about the runtime library implementation --
>> HashMap.put sets the entry to the last accessed object.  Other JREs
>> may differ.
>> public class QueryLabels { public static void main(String ...
>> argv) { Map<Query, String>  x = new HashMap<>() ; Query q1 =
>> QueryFactory.create("ASK{}") ; Query q2 =
>> QueryFactory.create("ASK{} # Andy's query ") ;
>> x.put(q2, q2.getRawQuery()) ; x.put(q1, q1.getRawQuery()) ;
>> if ( x.containsKey(q2) ) { System.out.println(x.get(q2)) ;
>> System.out.println("---") ; System.out.println(q2.getRawQuery()) ;
>> } else { System.out.println("Not found") ; } } }

View raw message