accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Deleting rows from the Java API
Date Wed, 09 May 2012 19:39:01 GMT
On Wed, May 9, 2012 at 2:43 PM, Adam Fuchs <adam.p.fuchs@ugov.gov> wrote:
> I would also add that "small number of entries" in this case is probably
> measured in the millions or tens of millions. If you're talking about
> deleting more entries than that then you might start to look into the
> iterator method.

Just to clarify, a filter is a type of iterator.

>
> Cheers,
> Adam
>
>
> On Wed, May 9, 2012 at 11:01 AM, Billie J Rinaldi
> <billie.j.rinaldi@ugov.gov> wrote:
>>
>> On Wednesday, May 9, 2012 10:31:46 AM, "Sean Pines" <spines83@gmail.com>
>> wrote:
>> > I have a use case that involves me removing a record from Accumulo
>> > based on the Row ID and the Column Family.
>> >
>> > In the shell, I noticed the command "deletemany" which allows you to
>> > specify column family/column qualifier. Is there an equivalent of this
>> > in the Java API?
>> >
>> > In the Java API, I noticed the method:
>> > deleteRows(String tableName, org.apache.hadoop.io.Text start,
>> > org.apache.hadoop.io.Text end)
>> > Delete rows between (start, end]
>> >
>> > However that only seems to work for deleting a range of RowIDs
>> >
>> > I would also imagine that deleting rows is costly; is there a better
>> > way to approach something like this?
>> > The workaround I have for now is to just overwrite the row with an
>> > empty string in the value field and ignore any entries that have that.
>> > However this just leaves lingering rows for each "delete" and I'd like
>> > to avoid that if at all possible.
>> >
>> > Thanks!
>>
>> Connector provides a createBatchDeleter method.  You can set the range and
>> columns for BatchDeleter just like you would with a Scanner.  This is not an
>> efficient operation (despite the current javadocs for BatchDeleter), but it
>> works well if you're deleting a small number of entries.  It scans for the
>> affected key/value pairs, pulls them back to the client, then inserts
>> deletion entries for each.  The deleteRows method, on the other hand, is
>> efficient because large ranges can just be dropped.  If you want to delete a
>> lot of things and deleteRows won't work for you, consider using a majc scope
>> Filter that filters out what you don't want, compact the table, then remove
>> the filter.
>>
>> Billie
>
>

Mime
View raw message