accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <>
Subject Re: How to remove entire row at the server side?
Date Wed, 06 Nov 2013 02:48:27 GMT
If an iterator is only set at scan time, then its logic will only be
applied when a client scans the table. The data will persist through major
and minor compaction and be visible if you scanned the RFile(s) backing the
table. "Suppress" is the better word in this case. Would you please open a
ticket pointing us where to update the documentation?

It looks like you'd want to implement a RowFilter for your use case. It has
the necessary hooks to avoid reading a whole row into memory and handling
the logic of determining whether or not to write keys that occur before the
column you're filtering on (at the cost of reading those keys twice).

On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <> wrote:

> Greetings everyone,
> I'm looking at the AgeOffFilter as a base from which to write a
> server-side filter / iterator to purge rows when they have aged off based
> on the value of a specific column in the row (expiry datetime <= now). So
> this differs from the AgeOffFilter in that the criterion for removal is
> from the same column in every row (not the Accumulo timestamp for an
> individual entry), and we need to remove the entire row not just individual
> entries. For example:
> Format: Key:CF:CQ:Value
> abc:data:title:"My fantastic data"
> abc:data:content:<bytedata>
> abc:data:creTs:2013-08-04T17:14:12Z
> abc:data:*expTs*:2013-11-04T17:14:12Z
> ... 6-8 more columns of data per row ...
> where *expTs* is the column to determine if the entire row should be
> removed based on whether its value is <= NOW.
> This task seemed easy enough as a client program (and it is really), but a
> server-side iterator would be far more efficient than sending millions of
> rowkeys across the network just to delete them (we'll be deleting more than
> a million every hour).  But I'm struggling to get there.
> In looking at, is the "magic" in the AgeOffFilter class
> that removes (deletes) an entry from a table the fact that the accept
> method returns false, combined with the fact that the iterator would be set
> to run at -majc or -minc time and it is the compaction code that actually
> deletes the entry?  If set to run only at scan time, would AgeOffFilter
> simply not return the rows during the scan, but not delete them?  The
> wording in the iterator classes varies, some saying "remove" others say
> "suppress" so it's not clear to me
> If that's the case, then I think I know where to implement the logic. The
> question is, how can I remove all the entries for the row once the accept
> method has determined it meets the criteria?
> Or as Mike Drob mentioned in a prior post, will basing my class on the
> RowFilter class instead of just Filter make things easier?  Or the
> WholeRowIterator?  Just trying to find the simplest solution.
> Sorry for what may be obvious questions but I'm more of a DB Architect
> that does some coding, and not a Java programmer by trade. With all of the
> amazing things Accumulo does, honestly I was surprised when I couldn't find
> a way to delete rows in the shell by criteria other than the rowkey!  I'm
> more used to having a shell to 'delete from *table *where *column *<=
> *value*'.
> But looking at it now, everyone's criteria for deletion will likely be
> different given the flexibility of a key=>value store.  If our rowkey had
> the date/timestamp as a prefix, I know an easy deletemany command in the
> shell would do the trick -- but the nature of the data is such that
> initially no expiration timestamp is set, and there is no means to update
> the key from the client app when expiration timestamp finally gets set (too
> much rework on that common tool I'm afraid).
> Thanks in advance.

View raw message