accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: Deleting rows from the Java API
Date Wed, 09 May 2012 15:00:51 GMT
On Wednesday, May 9, 2012 10:31:46 AM, "Sean Pines" <spines83@gmail.com> wrote:
> I have a use case that involves me removing a record from Accumulo
> based on the Row ID and the Column Family.
> 
> In the shell, I noticed the command "deletemany" which allows you to
> specify column family/column qualifier. Is there an equivalent of this
> in the Java API?
> 
> In the Java API, I noticed the method:
> deleteRows(String tableName, org.apache.hadoop.io.Text start,
> org.apache.hadoop.io.Text end)
> Delete rows between (start, end]
> 
> However that only seems to work for deleting a range of RowIDs
> 
> I would also imagine that deleting rows is costly; is there a better
> way to approach something like this?
> The workaround I have for now is to just overwrite the row with an
> empty string in the value field and ignore any entries that have that.
> However this just leaves lingering rows for each "delete" and I'd like
> to avoid that if at all possible.
> 
> Thanks!

Connector provides a createBatchDeleter method.  You can set the range and columns for BatchDeleter
just like you would with a Scanner.  This is not an efficient operation (despite the current
javadocs for BatchDeleter), but it works well if you're deleting a small number of entries.
 It scans for the affected key/value pairs, pulls them back to the client, then inserts deletion
entries for each.  The deleteRows method, on the other hand, is efficient because large ranges
can just be dropped.  If you want to delete a lot of things and deleteRows won't work for
you, consider using a majc scope Filter that filters out what you don't want, compact the
table, then remove the filter.

Billie

Mime
View raw message