hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Performance degradation when deleting columns
Date Wed, 09 Mar 2011 22:58:19 GMT
There's seems to be quite a few questions in your email, I'll try to
answer all of them but pardon me if I miss any of them.


On Wed, Mar 9, 2011 at 1:33 AM, Iulia Zidaru <iulia.zidaru@1and1.ro> wrote:
>  Thank you very much J-D. Your replay is very useful because we are working
> to change the delete scenario and we have to understand what happens inside
> HBase because it's impossible to entirely change deletes with put
> operations.
> The numbers are small because I did some tests on my local machine. On our
> testing cluster we have much higher values, but the performance degrade is
> still present. And, yes, we are hitting the same region over and over again.
> In my understanding things happen like this:
> Test 1(delete column1)
> - load in MemStore the latest version for the row
> - mark column 1 as deleted(insert a tombstone in MemStore)

Nope. MemStore only gets new Puts and Deletes. There's something
called the Block Cache and this is where the data is loaded from HDFS
(unless it's already present). So it loads the blocks from HDFS, it
may read from many files, and finally figures which is the value it is
deleting and puts the tombstone in the MemStore.

> Test 2(delete column2)
> - load in MemStore the latest version for the row
> - mark column 1 and column 2 as deleted(insert a tombstone in MemStore)

Same comment, also the tombstone for column should already be there
(unless there was a flush).

> Every scan should also avoid columns marked as deleted, so it has more and
> more columns to avoid. Is this true?

It's one way to put it... what does happen is that the MemStore
_grows_ when you delete data and it will have more and more discarding
to do as you add tombstones.

> What is not entirely clear is what to do to have a good scan performance
> again.

Good question, it would need more experimentation or stop deleting so many rows.

> I see here http://outerthought.org/blog/465-ot.html that there are some
> operations on regions: flush, compaction and major compaction.
> Deletes in the file system are performed only on a major compaction.
> In which way the deleted rows are loaded into MemStore after a flush or a
> minor compaction. Are they loaded with a tombstone or they are not loaded at
> all?

The tombstones are loaded yes, and they are deleted once a major
compaction materializes the full row in memory and figures what can be
discarded. HBase works just like Bigtable regarding deletes, it may be
clearer in that paper: http://labs.google.com/papers/bigtable.html

> Thank you,
> Iulia

View raw message