accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <texpi...@gmail.com>
Subject How to remove entire row at the server side?
Date Tue, 05 Nov 2013 23:20:24 GMT
Greetings everyone,
I'm looking at the AgeOffFilter as a base from which to write a server-side
filter / iterator to purge rows when they have aged off based on the value
of a specific column in the row (expiry datetime <= now). So this differs
from the AgeOffFilter in that the criterion for removal is from the same
column in every row (not the Accumulo timestamp for an individual entry),
and we need to remove the entire row not just individual entries. For
example:

Format: Key:CF:CQ:Value
abc:data:title:"My fantastic data"
abc:data:content:<bytedata>
abc:data:creTs:2013-08-04T17:14:12Z
abc:data:*expTs*:2013-11-04T17:14:12Z
... 6-8 more columns of data per row ...

where *expTs* is the column to determine if the entire row should be
removed based on whether its value is <= NOW.

This task seemed easy enough as a client program (and it is really), but a
server-side iterator would be far more efficient than sending millions of
rowkeys across the network just to delete them (we'll be deleting more than
a million every hour).  But I'm struggling to get there.

In looking at AgeOffFilter.java, is the "magic" in the AgeOffFilter class
that removes (deletes) an entry from a table the fact that the accept
method returns false, combined with the fact that the iterator would be set
to run at -majc or -minc time and it is the compaction code that actually
deletes the entry?  If set to run only at scan time, would AgeOffFilter
simply not return the rows during the scan, but not delete them?  The
wording in the iterator classes varies, some saying "remove" others say
"suppress" so it's not clear to me

If that's the case, then I think I know where to implement the logic. The
question is, how can I remove all the entries for the row once the accept
method has determined it meets the criteria?

Or as Mike Drob mentioned in a prior post, will basing my class on the
RowFilter class instead of just Filter make things easier?  Or the
WholeRowIterator?  Just trying to find the simplest solution.

Sorry for what may be obvious questions but I'm more of a DB Architect that
does some coding, and not a Java programmer by trade. With all of the
amazing things Accumulo does, honestly I was surprised when I couldn't find
a way to delete rows in the shell by criteria other than the rowkey!  I'm
more used to having a shell to 'delete from *table *where *column *<=
*value*'.

But looking at it now, everyone's criteria for deletion will likely be
different given the flexibility of a key=>value store.  If our rowkey had
the date/timestamp as a prefix, I know an easy deletemany command in the
shell would do the trick -- but the nature of the data is such that
initially no expiration timestamp is set, and there is no means to update
the key from the client app when expiration timestamp finally gets set (too
much rework on that common tool I'm afraid).

Thanks in advance.

Mime
View raw message