incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen (JIRA)" <>
Subject [jira] [Commented] (JENA-44) Support external sorting of bindings in ARQ
Date Fri, 26 Aug 2011 22:43:29 GMT


Stephen Allen commented on JENA-44:

I did not include a cancellation mechanism in the DataBags themselves because it was not clear
to me that it would be necessary.

The only point at which a significant amount of time can be spent in the DataBag code is in
the add() method right as a spill is occurring.  The program execution may be in Array.sort()
(SortedDataBag and DistinctDataBag) or it may be in the process of serializing tuples to disk.
 Given anticipated spill thresholds (1,000-100,000 tuples or memory in the 10-100 MB range),
and the fact that disk I/O is sequential (and thus fast), it seemed like an unnecessary complication
to support cancellation since those operations would complete in the 10's of seconds range.
 Any physical query operator using the DataBag would then be able to cancel immediately after
the spill finished (QueryIterSort passes the cancel request to it's embedded iterator which
will then throw the QueryCancellationException on the next iteration).

After the add phase is complete, and the QueryIterSort starts returning results, cancellation
will be handled by the super class (QueryIteratorBase).

Porting the tests meant that they would test the QueryIterSort with the embedded DataBag to
be sure that the temporary files were cleaned up when the iterator was cancelled.  So it's
not really testing cancellation on the DataBag per say, but rather the new QueryIterSort.

> Support external sorting of bindings in ARQ
> -------------------------------------------
>                 Key: JENA-44
>                 URL:
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ
>            Reporter: Sam Tunnicliffe
>            Assignee: Paolo Castagna
>            Priority: Minor
>         Attachments: JENA-44-0.patch, JENA-44-Depends-on-JENA-99-r1157891.patch, JENA-44_ARQ_r1156212.patch,
JENA-44_ARQ_r8531.patch, JENA-44_ARQ_r8724.patch
> In QueryIterSort, the sorting of the contents of an Iterator<Binding> is done in
memory, using Arrays.sort. This can be problematic where the set to be sorted is large. A
possible solution could be to use an external, disk-backed algorithm. A hybrid approach may
be better, whereby we attempt the in-memory sort, but when the number of bindings encountered
goes over a certain number, resort to the disk-backed variant.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message