accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: What priority for purge filter
Date Mon, 09 Dec 2013 23:45:39 GMT
On Mon, Dec 9, 2013 at 4:18 PM, Terry P. <texpilot@gmail.com> wrote:

> Thanks Billie and Christopher, sounds like I should have the purge
> iterator run after the VersioningIterator.
>
> Keith, uh oh, I was not aware that not all compactions will see the entire
> row.  That sounds like it could be bad for my case!  Here is the original
> thread that you helped me with as background:
>

Sometimes Accumulo will compact a subset of the data in a tablet.  This can
happen during a minor compaction and when a major compaction is operating
on a subset of files.  The rows columns and updates are spread across
multiple files.   In these cases you may only see a subset of the columns
in a row.  Also you may not see the latest version.   Scans and full
major compactions see all data.   You can tell the difference when an
iterators is initialized.  An IteratorEnvironment is passed into the init
method.   If the scope is majc and isFullMajorCompaction() is true then you
know you will see all data (also if the scope is scan).  For minor
compactions and partial major compactions you may want to just let
everything pass.


>
>
> http://mail-archives.apache.org/mod_mbox/accumulo-user/201311.mbox/%3CCAGUtCHryW3RR9PF5BAD+psxE-dswL9FyOGVv5Mn_Wj00o2mxig@mail.gmail.com%3E
>
> We only have 10-12 k/v pairs per row -- is that a factor? Can you explain
> the nuances with respect to when a compaction won't see the entire row?
>
> Thanks,
> Terry
>
>
>
> On Mon, Dec 9, 2013 at 1:34 PM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>>
>> On Mon, Dec 9, 2013 at 12:02 PM, Terry P. <texpilot@gmail.com> wrote:
>>
>>> Greetings all,
>>> With Accumulo v1.4.2, we have a purge filter/iterator that extents
>>> RowFilter and I have a question about what priority it should be
>>> implemented with. I see the default VersioningIterator runs at priority 20.
>>>
>>> Our purge iterator is designed to suppress (scan time) or remove (majc
>>> or minc compactions) rows based on the value in a column. Is it more
>>> efficient to run our purge iterator at a higher priority than the
>>> VersioningIterator, or does it
>>>
>>
>> Are you aware that not all compactions will see the entire row?
>>
>>
>>> really matter? Our VersioningIterator maxVersions is set to the default
>>> of 1 which is what we want/need.
>>>
>>> Thanks in advance,
>>> Terry
>>>
>>
>>
>

Mime
View raw message