hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: TIMERANGE performance on uniformly distributed keyspace
Date Sat, 14 Apr 2012 15:11:03 GMT
Hi there-

With respect to:

"* Does it need to hit every memstore and HFile to determine if there
isdata available? And if so does it need to do a full scan of that file to
determine the records qualifying to the timerange, since keys are stored
lexicographically?"

And...

"Using "scan 'table', {TIMERANGE => [t, t+x]}" :"
See...


http://hbase.apache.org/book.html#regions.arch
8.7.5.4. KeyValue



The timestamp is an attribute of the KeyValue, but unless you perform a
restriction using start/stop row it have to process every row.

Major compactions don't change this fact, they just change the number of
HFiles that have to get processed.



On 4/14/12 10:38 AM, "Rob Verkuylen" <rob@verkuylen.net> wrote:

>I'm trying to find a definitive answer to the question if scans on
>timerange alone will scale when you use uniformly distributed keys like
>UUIDs.
>
>Since the keys are randomly generated that would mean the keys will be
>spread out over all RegionServers, Regions and HFiles. In theory, assuming
>enough writes, that would mean that every HFile will contain the entire
>timerange of writes.
>
>Now before a major compaction, data is in the memstores and (non
>max.filesize) flushed&merged HFiles. I can imagine that a scan using a
>TIMERANGE can quickly serve from memstores and the smaller files, but how
>does it perform after a major compaction?
>
>Using "scan 'table', {TIMERANGE => [t, t+x]}" :
>* How does HBase handle this query in this case(UUIDs)?
>* Does it need to hit every memstore and HFile to determine if there is
>data available? And if so does it need to do a full scan of that file to
>determine the records qualifying to the timerange, since keys are stored
>lexicographically?
>
>I've run some tests on 300+ region tables, on month old data(so after
>major
>compaction) and performance/response seems fairly quick. But I'm trying to
>understand why that is, because hitting every HFile on every region seems
>to be ineffective. Lars' book figure 9-3 seems to indicate this as well,
>but cant seem to get the answer from the book or anywhere else.
>
>Thnx, Rob



Mime
View raw message