accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <>
Subject Re: Knowing when an iterator is at the "last row/entry"
Date Wed, 08 Jan 2014 15:41:48 GMT
Hi Keith,
The goal of the iterator is to purge data that has expired (or suppress it
for scans). The goal of the log message is to bring to light any data
format issues, as otherwise the "bad data" would NOT be purged by the
iterator and hang around forever, which would be bad, so yes we would purge
it with a special job. The iterator fires at both Full Major Compaction and
at Scan time.

Good point on "How did the bad data get there?" -- it shouldn't based on
how items are indexed and then inserted into Accumulo, but I wanted to
check for it in case the individual that installs the iterator in Accumulo
fat-fingers the date format, OR if someone changes it on the other side
(the app that sends the data to Accumulo). The first one could happen
easily, but the latter shouldn't happen. But as folks roll off programs and
others maintain the code, anything can happen.

Looks like ACCUMULO-1280 is exactly what I need! Maybe someday, but until
then what I have for the iterator will do the job (and thanks again for
your help on it!).

Best regards,

On Wed, Jan 8, 2014 at 9:30 AM, Keith Turner <> wrote:

> whats is your goal?  It seems like you want to produce counts about bad
> data suppressed at scan time.  What will you do with these counts?  Will
> you ever purge the bad data?  How did the bad data get there?  If you are
> not bulk importing the data, then maybe you could add constraints to the
> table?
>  On Mon, Jan 6, 2014 at 7:30 PM, Terry P. <> wrote:
>> Greetings folks,
>> I have an iterator that extends RowFilter and I have a case where I need
>> to know when its defined date format doesn't match the format of the data
>> being scanned by the iterator.  I don't want to flood the tserver log with
>> an error per row (how horrid that would be), but instead keep a counter of
>> the number of times that error occurs during a scan or major compaction.
>> Trouble is, I don't see any way to know when an iterator is on the "last
>> row" or "last entry" in its scan on a tabletserver, as if I could test for
>> that, I could then dump my single log message with the count of date format
>> parse errors for that scan/compaction.
>> Anyone know a way to determine if an iterator is at the "last entry" or
>> "last row" of its execution?
> I do not think there is a good way to do this.  ACCUMULO-1280
>> Many thanks in advance.

View raw message