incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen (JIRA)" <>
Subject [jira] [Commented] (JENA-90) Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries
Date Fri, 26 Aug 2011 22:05:34 GMT


Stephen Allen commented on JENA-90:

Hi Paolo,

I think the approach you want is to use QueryIterReduced instead of the new QueryIterDistinctSort
class you propose (also, an important note: [1]).  Perhaps QueryIterReduced could possibly
be optimized a little bit by eliminating the general purpose window array and using a single
variable in this particular case of a sorted input.

Although, in my mind, a better approach would be to modify the algebra as part of a query
optimization step (replace the OpDistinct with an OpReduced) when it is known that the QueryIterator
to which it is applied to is sorted (either because of an underlying OpOrder or a sorted triple/quad
index).  This makes it easier to determine what is going on during a query execution by examining
the transformed algebra instead of having branches in the physical operators themselves.

[1]  DistinctDataBag is not guaranteed to be sorted.  The in-memory bindings are stored in
a HashSet, thus if the bag does not spill to disk then no attempt is made to sort the bindings
in the iterator (so as not to perform extra effort).  It would not be hard to create a DistinctSortedDataBag,
but I'm not sure that it is necessary (and IMO limiting the number of primitive operations
helps simplify the system).

> Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries
> ------------------------------------------------------------------
>                 Key: JENA-90
>                 URL:
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Paolo Castagna
>            Assignee: Paolo Castagna
>            Priority: Trivial
>              Labels: arq, optimizer, sparql
>         Attachments: ARQ_JENA-90_r1159636.patch
> ARQ's optimizer could use an OpReduce instead of OpDistinct if a query is DISTINCT +
> OpReduce removes adjacent duplicates and it does not require a set of already seen bindings
as the current OpDistinct implementation does.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message