accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: What priority for purge filter
Date Wed, 11 Dec 2013 22:45:08 GMT
On Wed, Dec 11, 2013 at 3:22 PM, Terry P. <texpilot@gmail.com> wrote:

> Thanks Keith, wonderful explanation as always, and you are helping ensure
> everything goes as expected. Thank you sir!
> For minor compactions and partial major compactions, my approach to
> "letting everything pass" is:
> 1. In the init() method (the boolean variable inCorrectScope is declared
> at the head of the class and set to false to be safe):
>
> IteratorScope is = env.getIteratorScope();
> *if* (is.equals(IteratorScope.*scan*) || env.isFullMajorCompaction())
>   inCorrectScope = *true*;
>
> *else *  inCorrectScope = *false*;
>

The isFullMajorCompaction() method will complain if called when the scope
is not major compaction, so will want code like the following.

if(is.equals(IteratorScope.scan) || (is.equals(IteratorScope.majc) &&
env.isFullMajorCompaction()))




> 2. In the acceptRow() method:
>
> *while* ( rowIterator.hasTop() ) {
>   // If not in scan or full major compaction scope, short circuit and
> return true
> *  if* (!inCorrectScope)
> *    return* *true*;
>   <otherwise perform the steps to see if the row has the expTs column
> family and if the
>     purge criteria is met or not from the value in that column>
> My main question is just to confirm that I've put the return in the
> correct place.
>
> Also, I saw something that surprised me with a scan too. I did a scan with
> explicit columns listed, and NOT the expTimestamp column the purge iterator
> operates on, and I still see entries. If I include the expTs column the
> purge is done on in the explicit list of columns for the scan, entries are
> filtered out as they should be.  In our environment and use case
> for Accumulo, that shouldn't be an issue, but I can see how that might
> confuse someone in other circumstances.  Just curious if there is some way
> to "force" it to always run even if the "purge criterion column" is not
> included in the scan columns.
>

You can seek the iterator in the accept method with the columns you want.
The iterator passed to the accept method is confined to the current row, so
you do not need to specify a particular range.  Should be able to do
something like the following in the accept method.

     if(!inCorrectScope)
        return true;

     //myColumns is the set of columns you need to make a decision
     rowIterator.seek(new Range(), myColumns, true);

     while(rowIterator.hasTop()){
          //make decision
     }




>
> Thanks again as always for all the help.
>
> Best regards,
> Terry
>
>
>
>
> On Mon, Dec 9, 2013 at 5:45 PM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>>
>>  On Mon, Dec 9, 2013 at 4:18 PM, Terry P. <texpilot@gmail.com> wrote:
>>
>>> Thanks Billie and Christopher, sounds like I should have the purge
>>> iterator run after the VersioningIterator.
>>>
>>> Keith, uh oh, I was not aware that not all compactions will see the
>>> entire row.  That sounds like it could be bad for my case!  Here is the
>>> original thread that you helped me with as background:
>>>
>>
>> Sometimes Accumulo will compact a subset of the data in a tablet.  This
>> can happen during a minor compaction and when a major compaction is
>> operating on a subset of files.  The rows columns and updates are spread
>> across multiple files.   In these cases you may only see a subset of the
>> columns in a row.  Also you may not see the latest version.   Scans and
>> full major compactions see all data.   You can tell the difference when an
>> iterators is initialized.  An IteratorEnvironment is passed into the init
>> method.   If the scope is majc and isFullMajorCompaction() is true then you
>> know you will see all data (also if the scope is scan).  For minor
>> compactions and partial major compactions you may want to just let
>> everything pass.
>>
>>
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/accumulo-user/201311.mbox/%3CCAGUtCHryW3RR9PF5BAD+psxE-dswL9FyOGVv5Mn_Wj00o2mxig@mail.gmail.com%3E
>>>
>>> We only have 10-12 k/v pairs per row -- is that a factor? Can you
>>> explain the nuances with respect to when a compaction won't see the entire
>>> row?
>>>
>>> Thanks,
>>> Terry
>>>
>>>
>>>
>>> On Mon, Dec 9, 2013 at 1:34 PM, Keith Turner <keith@deenlo.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>>  On Mon, Dec 9, 2013 at 12:02 PM, Terry P. <texpilot@gmail.com> wrote:
>>>>
>>>>> Greetings all,
>>>>> With Accumulo v1.4.2, we have a purge filter/iterator that extents
>>>>> RowFilter and I have a question about what priority it should be
>>>>> implemented with. I see the default VersioningIterator runs at priority
20.
>>>>>
>>>>> Our purge iterator is designed to suppress (scan time) or remove (majc
>>>>> or minc compactions) rows based on the value in a column. Is it more
>>>>> efficient to run our purge iterator at a higher priority than the
>>>>> VersioningIterator, or does it
>>>>>
>>>>
>>>> Are you aware that not all compactions will see the entire row?
>>>>
>>>>
>>>>>  really matter? Our VersioningIterator maxVersions is set to the
>>>>> default of 1 which is what we want/need.
>>>>>
>>>>> Thanks in advance,
>>>>> Terry
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message