accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <>
Subject Re: Deleting many rows that match a given criterion
Date Thu, 31 Oct 2013 21:11:10 GMT
Hi Mike,
Did you wind up writing java code to do this?  Did you go with a RowFilter?

I have a similar circumstance where I need to delete millions of rows daily
and the criteria for deletion is not in the rowkey.

Thanks in advance,

On Wed, Oct 23, 2013 at 4:21 PM, Mike Drob <> wrote:

> Thanks for the feedback, Aru and Keith.
> I've had some more time to play around with this, and here's some
> additional observations.
> My existing process is very slow. I think this is due to each deletemany
> command starting up a new scanner and batchwriter, and creating a lot of
> rpc overhead. I didn't initially think that it would be a significant
> amount of data, but maybe I just had the wrong idea of what "significant"
> is in this case.
> I'm not sure the RowDeletingIterator would work in this case because I do
> use empty rows for other purposes. The RowFilter at compaction is a great
> option, except I had hoped to avoid writing actual java code. Looking back
> at this, I might have to bite that bullet.
> Again, thanks both for the suggestions!
> Mike
> On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <> wrote:
>> If its a significant amount of data, you could create a class that
>> extends row filter and set it as a compaction iterator.
>> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <> wrote:
>>> I'm attempting to delete all rows from a table that contain a specific
>>> word in the value of a specified column. My current process looks like:
>>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>>> > rows.out
>>> accumulo shell -f rows.out
>>> I tried playing around with scan iterators and various options on
>>> deletemany and deleterows but wasn't able to find a more straightforward
>>> way to do this. Does anybody have any suggestions?
>>> Mike

View raw message