incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen (Commented) (JIRA)" <>
Subject [jira] [Commented] (JENA-119) Eliminate memory bounds during query execution
Date Thu, 24 Nov 2011 00:16:40 GMT


Stephen Allen commented on JENA-119:

Completed QueryIterDistinct.  It streams until it passed the threshold for the first time,
at which point it consumes the entire input iterator before returning further bindings.

Committed in revision 1205673.
> Eliminate memory bounds during query execution
> ----------------------------------------------
>                 Key: JENA-119
>                 URL:
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>         Attachments: JENA-119-r1177090-Fuseki-Construct.patch, JENA-119-r1177452-ARQ-Construct.patch
> It would be nice to eliminate all memory bounds on queries.  Similar to JENA-44, it would
involve modifying all of the QueryIterator objects that maintain unbounded collections of
> The ones I've identified (let me know if I've missed any):
> + QueryIterSort
>       Complete!
> + QueryIterGroup
>       Probably one of the more complicated implementations.  I think it can be done with
a DistinctDataBag.
> + QueryIterDistinct
>       Can be implemented trivially using DistinctDataBag, but would lose streaming capability.
 We could do streaming just until the first spill, which would be a little more difficult
but not bad.  If we wanted streaming even after spilling, then we would need an on-disk hashtable
or b-tree (which could get expensive for maybe limited benefit, do you really need streaming
after 10,000 results?).
> + QueryIteratorCopy
>     Only appears to be used QueryIterService.  Simple implementation using DefaultDataBag.
> + QueryIteratorCaching
>       Does not match DataBag's assumption of completing all writes before iterating.
 But it isn't used anywhere, so maybe we remove it?
> + QueryIterDiff
> + QueryIterMinus
>       Both of these materialize the RHS into a collection.  Can be implemented with DefaultDataBag.
 As an aside, is this necessary to do for all queries?  What if the RHS is cheap (i.e. a single
> + QueryIterJoin
> + QueryIterLeftJoin
>      Both materialize RHS.  Are they used anywhere?  I was under the impression that
ARQ only considered left-deep plans with indexed joins on the RHS TriplePatterns.
> + SubQueries
>      I'm not sure how this is handled.  Are these materialized somewhere?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message