accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: m.putDelete versus RowDeletingIterator?
Date Wed, 09 Oct 2013 22:28:50 GMT
On Wed, Oct 9, 2013 at 4:21 PM, Eric Newton <eric.newton@gmail.com> wrote:

> They do different things.
>
> Deleting mutations marks each entry with a delete marker.  Using the
> iterator marks a whole row with a single mutation.
>
> If you have a million entries in your row, the iterator is faster for
> the delete, but requires a seek to the start of the row for every
> read, so reads are slower.
>
> If your row has one entry, they are the same thing.
>
> Somewhere under N keys... the mutation path will be quite fast, and
> still preserve your reading speed.  I'll just pull a number out of
> thin air... let's say a few thousand.
>

The iterator may still be useful even if rows have few columns because a
row can be deleted w/o reading the row.  W/ m.putDelete() you may need to
read the row and insert a delete for each column value.   If you know what
columns to delete then you can avoid the read

If I have 10M rows to delete, each row having 10 unpredictable columns.
 With the iterator I can batch write 10M row deletion mutations.   Without
the iterator I do 10M seeks, 100M reads and write 100M deletes.


>
> -Eric
>
>
>
> On Wed, Oct 9, 2013 at 4:01 PM, David Medinets <david.medinets@gmail.com>
> wrote:
> > Are there any reason to favor one approach over the other?
>

Mime
View raw message