hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulia Zidaru <iulia.zid...@1and1.ro>
Subject Re: Performance degradation when deleting columns
Date Thu, 10 Mar 2011 09:20:21 GMT
  Thank you J-D. These operations are more clear to me now.

On 03/10/2011 12:58 AM, Jean-Daniel Cryans wrote:
> There's seems to be quite a few questions in your email, I'll try to
> answer all of them but pardon me if I miss any of them.
>
> J-D
>
> On Wed, Mar 9, 2011 at 1:33 AM, Iulia Zidaru<iulia.zidaru@1and1.ro>  wrote:
>>   Thank you very much J-D. Your replay is very useful because we are working
>> to change the delete scenario and we have to understand what happens inside
>> HBase because it's impossible to entirely change deletes with put
>> operations.
>> The numbers are small because I did some tests on my local machine. On our
>> testing cluster we have much higher values, but the performance degrade is
>> still present. And, yes, we are hitting the same region over and over again.
>>
>> In my understanding things happen like this:
>>
>> Test 1(delete column1)
>> - load in MemStore the latest version for the row
>> - mark column 1 as deleted(insert a tombstone in MemStore)
> Nope. MemStore only gets new Puts and Deletes. There's something
> called the Block Cache and this is where the data is loaded from HDFS
> (unless it's already present). So it loads the blocks from HDFS, it
> may read from many files, and finally figures which is the value it is
> deleting and puts the tombstone in the MemStore.
>
>> Test 2(delete column2)
>> - load in MemStore the latest version for the row
>> - mark column 1 and column 2 as deleted(insert a tombstone in MemStore)
> Same comment, also the tombstone for column should already be there
> (unless there was a flush).
>
>>
>> Every scan should also avoid columns marked as deleted, so it has more and
>> more columns to avoid. Is this true?
> It's one way to put it... what does happen is that the MemStore
> _grows_ when you delete data and it will have more and more discarding
> to do as you add tombstones.
>
>> What is not entirely clear is what to do to have a good scan performance
>> again.
> Good question, it would need more experimentation or stop deleting so many rows.
>
>> I see here http://outerthought.org/blog/465-ot.html that there are some
>> operations on regions: flush, compaction and major compaction.
>> Deletes in the file system are performed only on a major compaction.
>> In which way the deleted rows are loaded into MemStore after a flush or a
>> minor compaction. Are they loaded with a tombstone or they are not loaded at
>> all?
> The tombstones are loaded yes, and they are deleted once a major
> compaction materializes the full row in memory and figures what can be
> discarded. HBase works just like Bigtable regarding deletes, it may be
> clearer in that paper: http://labs.google.com/papers/bigtable.html
>
>> Thank you,
>> Iulia


-- 
Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
iulia.zidaru@1and1.ro
0040 31 223 9153

  


Mime
View raw message