incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen" <>
Subject RE: SPARQL queries with ORDER BY + LIMIT
Date Fri, 05 Aug 2011 18:14:18 GMT

Seems like a good optimization.  Although you would want to use a bounded
priority queue so you only use O(N) memory.  Some info about implementing it
using an implicit heap is at [1].

Google's Guava library also provides it (among many other nice classes!)

   "A min-max priority queue can be configured with a maximum size.
    If so, each time the size of the queue exceeds that value, the
    queue automatically removes its greatest element according to
    its comparator (which might be the element that was just added).
    This is different from conventional bounded queues, which either
    block or reject new elements when full.

    This implementation is based on the min-max heap developed by
    Atkinson, et al. Unlike many other double-ended priority queues,
    it stores elements in a single array, as compact as the
    traditional heap data structure used in PriorityQueue."



-----Original Message-----
From: Paolo Castagna [] 
Sent: Friday, August 05, 2011 10:33 AM
Subject: SPARQL queries with ORDER BY + LIMIT

when someone submits a SPARQL query which contains an ORDER BY + LIMIT,
ARQ sorts the entire and then takes the first N, according to the specified

If the ORDER BY operates over a large number of bindings, these sort of
queries can become a problem. We see ORDER BY queries with small LIMITs
and often enough no OFFSET specified.

Fortunately, in the ARQ algebra package there is already a OpTopN [1]
operator, which currently get executed as a sort followed by a slice,
see OpExecutor [2]:

  protected QueryIterator execute(OpTopN opTop, QueryIterator input)
      // XXX Do better; execute - OpTopN
      QueryIterator qIter = executeOp(opTop.getSubOp(), input) ;
      qIter = new QueryIterSort(qIter, opTop.getConditions(), execCxt) ;
      qIter = new QueryIterSlice(qIter, 0, opTop.getLimit(), execCxt) ;
      return qIter ;

The comment suggests something better is possible. :-)

We could avoid the sort, perhaps using a heap [3] and, in Java, a
PriorityQueue [4,5]. We would need to add a QueryIterTopN and use it
instead of the QueryIterSort + QueryIterSlice.

 "It is a common misconception that a priority queue is a heap.
  A priority queue is an abstract concept like "a list" or "a map";
  just as a list can be implemented with a linked list or an array,
  a priority queue can be implemented with a heap or a variety of
  other methods." [4]


  A Java PriorityQueue is "an unbounded priority queue based on a
  priority heap." [5]

The other thing to do is to spot an ORDER BY + LIMIT (with a sufficiently
small LIMIT?) and use the OpTopN instead of the OpOrder and OpSlice.
This would need to be done in a new TransformOrderByLimit or TransformTopN
transformation to be used by (in which position?).

Does this seem a reasonable thing to do and a reasonable way to do it?



View raw message