hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From de Souza Medeiros Andre <andre.medei...@aalto.fi>
Subject RE: Performance issues of prepending a table
Date Thu, 19 Apr 2012 09:57:00 GMT
Hi Ian,

Thank you very much, that pretty much answers it.

Best regards,
Andre Medeiros
From: Ian Varley [ivarley@salesforce.com]
Sent: Wednesday, April 18, 2012 17:11
To: user@hbase.apache.org
Subject: Re: Performance issues of prepending a table

I would guess that this approach would be susceptible to the same kind of "hot spotting" as
inserting sequential keys; if you're prepending globally (i.e. there's one global "first"
row), then all activity will be taking place on the same region server, so you wouldn't be
taking advantage of the natural parallelism of a clustered system like HBase.

That aside, I can't think of anything architectural about HBase that would making it perform
poorly to be continually inserting rows that sort before other rows; I think the log structured
merge trees that hbase uses for storage will handle any kind of insert activity more or less
identically, and write to the WAL and the memstore with equal speed regardless of row key
position (and, flushes to storefiles on disk are based on the sorted arrangement in memory,
which has already taken place by that point). There may be some smaller order differences
in the speed of inserting into the memstore, depending on position, but that'd be something
you'd have to benchmark, and my guess is you'd get nothing discernible. But as always, the
best way to know is to try it. :)


On Apr 18, 2012, at 8:59 AM, de Souza Medeiros Andre wrote:

Hi all,

For some specific reason, I have a HBase table that should be frequently prepended. The row
keys in this table are long integers (converted to bytes of course). "Prepend" is an operation
that does the following:
1. Scans the table just for the purpose of getting the row key X of the first row, then stops
the scan.
2. CheckAndSet on X-1, checking if row X-1 is null and putting data at row key X-1.
3. If the CAS failed, try CAS on X-2, etc.

I'd like to know if there are any obvious performance drawbacks with this approach, compared
to inserting rows randomly in the table. With "obvious performance drawbacks" I mean something
that doesn't need to be benchmarked to know its effects.

I am aware that scanning plus CAS will be slower than a simple Put, but I'd like to know if
prepending has any negative effect regarding region management and misc.

Thank you,
Andre Medeiros

View raw message