accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-759) remove priority setting for scan-time iterators
Date Tue, 11 Sep 2012 19:22:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453309#comment-13453309
] 

Keith Turner edited comment on ACCUMULO-759 at 9/12/12 6:21 AM:
----------------------------------------------------------------

After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective
I think that cleanest option may be to set a list.  With a method like appendScanIterator()
that applies we need method like clearScanIterators().  I think using a boolean or negative
number is a behind the scenes implementation detail.  When the API uses a list, it does not
need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not
allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments
e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting
should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   /**
    * This is a legacy method, you should probably use {@link setIterators(...)} or {@link
setIterator(...)} which are much 
    * simpler.  This method allows you to insert scan iterators before iterators configured
for the table which is not possible
    * with {@link setIterators(...)} or {@link setIterator(...)}.
    */
   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after
all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or
{@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after
all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or
{@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting
                
      was (Author: kturner):
    After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective
I think that cleanest option may be to set a list.  With a method like appendScanIterator()
that applies we need method like clearScanIterators().  I think using a boolean or negative
number is a behind the scenes implementation detail.  When the API uses a list, it does not
need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not
allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments
e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting
should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after
all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or
{@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after
all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or
{@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting
                  
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily.
However that priority is an integer that doesn't directly convey the iterator's relationship
to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time
iterator underneath a configured table iterator (please let me know if I'm wrong about this),
and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator
priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators,
in which the order of the iterator tree is the same order in which the addScanIterator method
is called, and all scan-time iterators apply after the configured iterators apply. The change
to the API should just be to remove the priority number, and the existing IteratorSetting
constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to
a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to
use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message