accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <>
Subject Re: Deleting many rows that match a given criterion
Date Wed, 23 Oct 2013 21:21:52 GMT
Thanks for the feedback, Aru and Keith.

I've had some more time to play around with this, and here's some
additional observations.

My existing process is very slow. I think this is due to each deletemany
command starting up a new scanner and batchwriter, and creating a lot of
rpc overhead. I didn't initially think that it would be a significant
amount of data, but maybe I just had the wrong idea of what "significant"
is in this case.

I'm not sure the RowDeletingIterator would work in this case because I do
use empty rows for other purposes. The RowFilter at compaction is a great
option, except I had hoped to avoid writing actual java code. Looking back
at this, I might have to bite that bullet.

Again, thanks both for the suggestions!


On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <> wrote:

> If its a significant amount of data, you could create a class that extends
> row filter and set it as a compaction iterator.
> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <> wrote:
>> I'm attempting to delete all rows from a table that contain a specific
>> word in the value of a specified column. My current process looks like:
>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>> > rows.out
>> accumulo shell -f rows.out
>> I tried playing around with scan iterators and various options on
>> deletemany and deleterows but wasn't able to find a more straightforward
>> way to do this. Does anybody have any suggestions?
>> Mike

View raw message