accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators
Date Wed, 12 Sep 2012 00:02:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453566#comment-13453566
] 

Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------

I see the value in treating the Scanner as an immutable view of a dataset within client code
without interference from per-table config. However, I think it would be a simple matter to
subclass Scanner for this purpose. A Scanner is a scanner over a data source, it is not strictly
a dataset. I believe I spoke to Adam previously about creating such an API... one where you
would manipulate a Query object representing a data source, and then executing it. Perhaps
that's still a reasonable option?

It still would be reasonable to have Scanner have built-in support for such things like "after
all per-table iterators". Perhaps priority isn't the best way to represent it, though? Keith
and I talked about possibly creating an API where iterators are constructed more like:
{code:java}
IteratorSetting a, b, c;
IteratorChain chain = new IteratorChain();
chain.insertAfter(LAST, a);
chain.insertBefore(a.getName(), b);
chain.insertAfter(b.getName(), c);
{code}

One other thing to consider is that any change might want to be consistent across all APIs...
including that pertaining to per-table configuration, and in things like the tableOperations.compact()
method.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily.
However that priority is an integer that doesn't directly convey the iterator's relationship
to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time
iterator underneath a configured table iterator (please let me know if I'm wrong about this),
and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator
priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators,
in which the order of the iterator tree is the same order in which the addScanIterator method
is called, and all scan-time iterators apply after the configured iterators apply. The change
to the API should just be to remove the priority number, and the existing IteratorSetting
constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to
a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to
use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message