hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Performance degradation when deleting columns
Date Tue, 08 Mar 2011 18:35:01 GMT
That's a weird use case Iulia ;)

So those numbers are pretty small, for all I know you could be hitting
the same region over and over again... so each run of the test
influences the next one.

In any case, deletes are special operations where it has the do a Get
for each column that's specified in order to find the latest insert
before inserting a tombstone in the MemStore. This pretty expensive
and can generate a lot of churn in the block cache. It may be worth
more investigation tho, Ryan might have something to add about that.

A Put operation is really just an insert in the MemStore, it doesn't
read from any file, so it's supposed to be faster than a Delete.

J-D

On Tue, Mar 8, 2011 at 4:30 AM, Iulia Zidaru <iulia.zidaru@1and1.ro> wrote:
>  Hi,
> We have some problems when performing a large amounts of deletes.
>
> We are using ASF HBase 0.90 with cloudera distribution for HDFS(cdh3b3)). We
> store in HBase the inverted index of some documents. We get a constant
> throughput in inserting documents and in scanning the table, but we have
> problems when some deletes are performed.
>
> We did some tests (10 000) with the following operations:
> - scan some rows (a few: 2-3 rows)
> - delete some columns for the previous scanned rows(one column per test
> deleted)
> - add some columns for the previous scanned rows
>
> We got a huge degradation in performance in both scan and deletes.
> - scan took from 5 to 15 milliseconds (for first 10 000)
> - delete took from 4 to 13 milliseconds
> and it continued to decrease. After 30000 operations a scan took 45
> milliseconds and a delete 16 milliseconds.
>
> The put operation was (almost) constant in time (3.99 ms to 4.5ms after
> 30000 tests).
>
> On long time tests, we had the same performance degradation, but it seems
> that at some point the performance is up again. It might be a major
> compaction, a disk flush or what?
>
> In which way deletes affect the scan operation? How can we minimize these
> effects? Do you know what operation will put the database in the optimal
> state?
>
> Thank you,
> Iulia
>
>
>
> --
> Iulia Zidaru
> Java Developer
>
> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
> 18 Mircea Eliade St
> Sect 1, Bucharest
> RO Bucharest, 012015
> iulia.zidaru@1and1.ro
> 0040 31 223 9153
>
>
>

Mime
View raw message