hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Re: Slow row deletion performance in comparison to insertion
Date Wed, 27 Jun 2012 23:15:50 GMT
Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that
is the 
reason for the performance degradation.  Additionally HBASE-5941 with the locks is also contributing

to the performance degradation.

So until those changes get into an hbase release I just have to live with the slower performance.
 
Is there anything I need to do on my end?

Just as a sanity check, I tried setting a timestamp in the delete object but it made no difference.
 
I'll batch my deletes at end as you suggested (as memory allows).

Thanks,
~Jeff

On 6/27/2012 4:11 PM, Ted Yu wrote:
> Amit:
> Can you point us to the JIRA or changelist in 0.89-fb ?
>
> Thanks
>
> On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <amitanand.s@fb.com> wrote:
>
>> There was some difference in the way locks are taken for batched deletes
>> and puts.  This was fixed for 89.
>>
>> I wonder if the same could be the issue here.
>>
>> Sent from my iPhone
>>
>> On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <jeffw@qualtrics.com> wrote:
>>
>>> I'm struggling to understand why my deletes are taking longer than my
>> inserts.  My understanding is that a delete is just an insertion of a
>> tombstone.  And I'm deleting the entire row.
>>> I do a simple loop (pseudo code) and insert the 100 byte rows:
>>>
>>> for (int i=0; i < 50000; i++)
>>> {
>>>     puts.append(new Put(rowkey[i], oneHundredBytes[i]));
>>>
>>>     if (puts.size() % 1000 == 0)
>>>     {
>>>         Benchmark.start();
>>>         table.batch(puts);
>>>         Benchmark.stop();
>>>     }
>>> }
>>>
>>>
>>> The above takes about 8282ms total.
>>>
>>> However the delete takes more than twice as long:
>>>
>>> Iterator it = table.getScannerScan(rowkey[0],
>> rowkey[50000-1]).iterator();
>>> while(it.hasNext())
>>> {
>>>     r = it.next();
>>>     deletes.append(new Delete(r.getRow()));
>>>     if (deletes.size() % 1000 == 0)
>>>     {
>>>         Benchmark.start();
>>>         table.batch(deletes);
>>>         Benchmark.stop();
>>>     }
>>> }
>>>
>>> The above takes 17369ms total.
>>>
>>> I'm only benchmarking the deletion time and not the scan time.
>> Additionally if I batch the deletes into one big one at the end (rather
>> than while I'm scanning) it takes about the same amount of time. I am
>> deleting the entire row so I wouldn't think it would be doing a read before
>> the delete (
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E
>> ).
>>> Any thoughts on why it is slower and how I can speed it up?
>>>
>>> Thanks,
>>> ~Jeff
>>>
>>> --
>>> Jeff Whiting
>>> Qualtrics Senior Software Engineer
>>> jeffw@qualtrics.com
>>>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com




Mime
View raw message