accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators
Date Tue, 11 Sep 2012 07:40:07 GMT


Christopher Tubbs commented on ACCUMULO-759:

I do like the ability to chain added by returning a Scanner, though I still prefer "append"
over "add" due to the tendency for "add" to get overloaded and confusing. Also, if "append"
is the behavior for scan-time iterators, without the priority, then the term "scan" can be
dropped from the method. So, "appendScanIterator(ScanIteratorSetting)" becomes "scanner.appendIterator(IteratorSetting)".

Also, the boolean seems to achieve the same as the convention of <1024 vs. >=1024 (scan
iterators would just start at 1024, and +1 for each successive iterator appended). However,
the boolean is more restrictive than this, because it prevents insertion of an iterator at
other points in the scan. So, I guess it comes down to whether or not the current behavior
should be modified in this restrictive way. Personally, I think it shouldn't be. Consider
two use cases:

TableA is configured with a per-table iterator that groups and displays rows as JSON upon
query. A query framework is built on this table that allows users to filter out particular
columns from each row at scan time (relational algebra projection). However, the view will
always be JSON. It seems reasonable to set a per-table iterator that converts rows to JSON
at priority 500, and at scan-time, inject the filtering iterator at priority 400.

Now, this is a trivial example, where users are constrained to a particular view that could
just as easily be added at scan time. However, consider the use case where an iterator is
applied to a table to enforce a view policy that is intended to protect patient privacy or
enforce a DRM scheme on multimedia content. Such an iterator may allow lower-priority filters,
but could only show counts of the matching results. Alternatively, if such an iterator is
given the proper payment method, it could encode the data with a DRM scheme to lease the queried
content to a subscriber for some requested period of time.

These are just a few examples of why I think it would be too constraining to only allow appending
scan-time iterators and not allow injecting them at a lower priority.
> remove priority setting for scan-time iterators
> -----------------------------------------------
>                 Key: ACCUMULO-759
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
> Iterators have a priority setting that allows a user to order iterators arbitrarily.
However that priority is an integer that doesn't directly convey the iterator's relationship
to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time
iterator underneath a configured table iterator (please let me know if I'm wrong about this),
and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator
priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators,
in which the order of the iterator tree is the same order in which the addScanIterator method
is called, and all scan-time iterators apply after the configured iterators apply. The change
to the API should just be to remove the priority number, and the existing IteratorSetting
constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to
a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to
use iterators correctly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message