incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Vesse <>
Subject Re: Carrying raw query strings (public API change).
Date Fri, 13 Apr 2012 16:00:28 GMT
We work at the level of QueryEngine but we have multiple implementations as depending on the
query either the entire thing can be handled by the backend or only parts of it can so we
override eval() in our query engine implementations.

Then we either use a OpExecutor or just parcel the query off wholesale to our backend.  So
I was slightly inaccurate in that we no longer use StageGenerator (though we did at one point)

Regardless we are still at a level of the API where we don't see the QueryExecution so we
couldn't utilize the context even if we wanted to

The more I think about this the less I think it actually solves our problem (it being carrying
raw query strings) because we are still left with the issue that a query may turn into multiple
queries internally and our developers wanted the query string to associate with each of those
internal queries but that isn't a 1:1 relationship

Maybe it is for the best if I just go ahead and revert those changes?


On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:

> Rob -
> On 12/04/12 17:43, Robert Vesse wrote:
>> The notion of jobs makes sense to me but it implies some refactoring
>> of our APIs are is simply not feasible in our current setup where we
>> use Fuseki this is not doable because we are extending Fuseki
>> indirectly by hooking into ARQs QueryExecutionFactory mechanism and
>> so don't have any means to create this Job thing prior to starting to
>> see the actual query in our ARQ integration layer.
> >
>> Even in a hypothetical situation where we did have such capability we
>> still run into the issue that at some point the query has to drop
>> into the ARQ machinery to be processed at which point it has to be a
>> query and we'd lose any visibility back to our Job notion anyway.
>> This is especially true since the point at which we actually send
>> work off to our backend for processing is potentially very low level
>> in the ARQ API (as far down as the Stage Generator layer)
> This makes me a bit nervous; the needs of Cray to tunnel info from one place to another
because of current code structure balanced against a long term change to the public API.
> The good news is there is a better way in Fuseki.
> The QueryExecution object is a one-time-use object and it has somewhere to put such additional
information - getContext().  This is where the current time for the query goes for example.
 It even gets to the StageGenerator.  It's already got the query as an object.
> The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the nearest I can see
the "Job" from the point of view of the web request.
> So we can have the QueryExecution carry a per-operation label.
> Change:
> 1/ Add a new symbol: ARQ.queryLabel
> 2/ SPARQL_Query.executeQuery creates the QueryExecution and can set the context with
a key/value that is the query string as ARQ.queryLabel.  It knowns the queryStringLog -- it
can take the original query string as well, or we can put it in the HttpActionQuery and put
in the execution context.
> (Aside: I thought you'd be using OpExecutor so as to access the filters and LeftJoins
as -- different discussion though ... though I'd like to remove StageGenerator because there
are too many ways to do very similar things makign it messier to add new storage layers ..,
so compatibility issue noted!)
>> I don't think having the raw query string breaks the Java
>> equality/hash code contract since the Query class is a structural
>> representation of a query, preserving the original query string is
>> just a convenience to users and doesn't change the fact that the
>> class is a structural representation of a query and by definition
>> different query strings can resolve to the same definition (white
>> space, comments, prefix ordering etc.)
> In your use case, sure, the string is not particularly significant.
> The contract for .equals in java is that two objects to be equal they must be substitutable
for one another.  Jena is a general library - some app may rely on the query label for display
purposes, or as a key into another data structure.  That's the long-term promise being made
and it's hard to predicate what some app may do - hence my desire for a strict adherence to
the .equals contract.
> Also, by preserving the query string and comments, there is a slippery slope to putting
stuff in comments and relying on it.
> Preserving the query string is convenience in your use case but if some other use case
is relying on the label for something, it is no longer ancillary.
> This shows a difference - a bit artificial but it's also supposed to be small example
-- image the two "put" operations being in very different parts of the code:
> try changing the order of the two .put -- I expected the different output when it was
put(q1,..), put(q2,...) but the way round below.  We live and learn about the runtime library
implementation -- HashMap.put sets the entry to the last accessed object.  Other JREs may
> public class QueryLabels
> {
>    public static void main(String ... argv)
>    {
>        Map<Query, String> x = new HashMap<>() ;
>        Query q1 = QueryFactory.create("ASK{}") ;
>        Query q2 = QueryFactory.create("ASK{} # Andy's query ") ;
>        x.put(q2, q2.getRawQuery()) ;
>        x.put(q1, q1.getRawQuery()) ;
>        if ( x.containsKey(q2) )
>        {
>            System.out.println(x.get(q2)) ;
>            System.out.println("---") ;
>            System.out.println(q2.getRawQuery()) ;
>        }
>        else
>        {
>            System.out.println("Not found") ;
>        }
>    }
> }

View raw message