accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <texpi...@gmail.com>
Subject Re: Deleting many rows that match a given criterion
Date Thu, 31 Oct 2013 21:11:10 GMT
Hi Mike,
Did you wind up writing java code to do this?  Did you go with a RowFilter?

I have a similar circumstance where I need to delete millions of rows daily
and the criteria for deletion is not in the rowkey.

Thanks in advance,
Terry



On Wed, Oct 23, 2013 at 4:21 PM, Mike Drob <mdrob@mdrob.com> wrote:

> Thanks for the feedback, Aru and Keith.
>
> I've had some more time to play around with this, and here's some
> additional observations.
>
> My existing process is very slow. I think this is due to each deletemany
> command starting up a new scanner and batchwriter, and creating a lot of
> rpc overhead. I didn't initially think that it would be a significant
> amount of data, but maybe I just had the wrong idea of what "significant"
> is in this case.
>
> I'm not sure the RowDeletingIterator would work in this case because I do
> use empty rows for other purposes. The RowFilter at compaction is a great
> option, except I had hoped to avoid writing actual java code. Looking back
> at this, I might have to bite that bullet.
>
> Again, thanks both for the suggestions!
>
> Mike
>
>
> On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <keith@deenlo.com> wrote:
>
>> If its a significant amount of data, you could create a class that
>> extends row filter and set it as a compaction iterator.
>>
>>
>> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <mdrob@mdrob.com> wrote:
>>
>>> I'm attempting to delete all rows from a table that contain a specific
>>> word in the value of a specified column. My current process looks like:
>>>
>>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>>> > rows.out
>>> accumulo shell -f rows.out
>>>
>>> I tried playing around with scan iterators and various options on
>>> deletemany and deleterows but wasn't able to find a more straightforward
>>> way to do this. Does anybody have any suggestions?
>>>
>>> Mike
>>>
>>
>>
>

Mime
View raw message