accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <texpi...@gmail.com>
Subject Re: Knowing when an iterator is at the "last row/entry"
Date Wed, 08 Jan 2014 04:38:06 GMT
Thanks Christopher. I implemented this to override the hasTop method, but
for our case in the acceptRow method the code seeks to a specific
ColumnFamily that has the criteria required to determine whether the row
should be let through or suppressed. With my simple unit tests I'm not
seeing any output in the log, though if I log it for every occurrence that
does work, so I know my logging works.

Here's my acceptRow method and hasTop override method:

    @Override
    public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator)
            throws IOException {

        // If not in scan or full major compaction scope, short circuit and
return true
        if (!inScope) return true;

        // Seek to expTs ColumnFamily; if not done, the scans that do not
include the
        // expTs column will return regardless whether the row should be
suppressed
        rowIterator.seek(new Range(), myColumns, true);

        while ( rowIterator.hasTop() ) {
            int cmp =
rowIterator.getTopKey().getColumnFamilyData().compareTo(bsExpTsColFam);
            if (cmp == 0) {
              try {
                  expTsDate =
df.parse(rowIterator.getTopValue().toString());
                  if (currentTimeMillis - expTsDate.getTime() >
thresholdMillis )
                      return false;
              } catch (ParseException e) {
                  // Increment date format parse error counter
                  dfParseErrorCount++;
              }
      } else
           // Went past desired column family so skip to next row
           break;
      rowIterator.next();
        }
        return true;
    }

    @Override
    public boolean hasTop() {
      if (!super.hasTop()) {
        if (dfParseErrorCount > 0)
          log.debug(iteratorScopeString + " operation encountered " +
            dfParseErrorCount + " date format parse errors.");
      }
      return super.hasTop();
    }

Am I in a Catch-22 here?

Thanks,
Terry



On Mon, Jan 6, 2014 at 6:40 PM, Christopher <ctubbsii@apache.org> wrote:

> You can override hasTop() to log the message when getSource().hasTop()
> is false. Something like:
>
> @Override
> public boolean hasTop() {
>   if (!super.hasTop())
>     log.debug("my message");
>   return super.hasTop();
> }
>
> However, this won't guarantee that you will catch all occurrences.
> Some scan sessions could expire and the iterator stack be torn down
> and re-created before the iterator exhausts its source iterator. A
> client could resume in the middle of a tablet, with a new instance of
> your iterator, and the counter would be smaller, because the count
> from the previous instance of the iterator will have been lost.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Mon, Jan 6, 2014 at 7:30 PM, Terry P. <texpilot@gmail.com> wrote:
> > Greetings folks,
> > I have an iterator that extends RowFilter and I have a case where I need
> to
> > know when its defined date format doesn't match the format of the data
> being
> > scanned by the iterator.  I don't want to flood the tserver log with an
> > error per row (how horrid that would be), but instead keep a counter of
> the
> > number of times that error occurs during a scan or major compaction.
> >
> > Trouble is, I don't see any way to know when an iterator is on the "last
> > row" or "last entry" in its scan on a tabletserver, as if I could test
> for
> > that, I could then dump my single log message with the count of date
> format
> > parse errors for that scan/compaction.
> >
> > Anyone know a way to determine if an iterator is at the "last entry" or
> > "last row" of its execution?
> >
> > Many thanks in advance.
>

Mime
View raw message