hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Verkuylen <...@verkuylen.net>
Subject TIMERANGE performance on uniformly distributed keyspace
Date Sat, 14 Apr 2012 14:38:46 GMT
I'm trying to find a definitive answer to the question if scans on
timerange alone will scale when you use uniformly distributed keys like
UUIDs.

Since the keys are randomly generated that would mean the keys will be
spread out over all RegionServers, Regions and HFiles. In theory, assuming
enough writes, that would mean that every HFile will contain the entire
timerange of writes.

Now before a major compaction, data is in the memstores and (non
max.filesize) flushed&merged HFiles. I can imagine that a scan using a
TIMERANGE can quickly serve from memstores and the smaller files, but how
does it perform after a major compaction?

Using "scan 'table', {TIMERANGE => [t, t+x]}" :
* How does HBase handle this query in this case(UUIDs)?
* Does it need to hit every memstore and HFile to determine if there is
data available? And if so does it need to do a full scan of that file to
determine the records qualifying to the timerange, since keys are stored
lexicographically?

I've run some tests on 300+ region tables, on month old data(so after major
compaction) and performance/response seems fairly quick. But I'm trying to
understand why that is, because hitting every HFile on every region seems
to be ineffective. Lars' book figure 9-3 seems to indicate this as well,
but cant seem to get the answer from the book or anywhere else.

Thnx, Rob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message