hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: scan.setTimeRange performance
Date Mon, 24 Sep 2012 19:15:08 GMT
Hi Eugeny,

The mailing list stripped your attachement (as it often does) so you
might want to put it on a public web server.

I don't have much to contribute except than to point to a recent
conversation that you can find here:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28722

Hope this helps,

J-D

On Fri, Sep 21, 2012 at 5:03 AM, Eugeny Morozov
<emorozov@griddynamics.com> wrote:
> Hello!
>
> It is known and I saw it in the code that time range set by
> scan.setTimeRange is used to filter out HFiles for further scan.
> Which means that speed of following scanner.next must be almost zero in case
> if I set time range far away in future. I am sure that I do not have HFiles
> that fall into the set time range period.
>
> But - and here is the question - surprisingly scanning with set time range
> is far longer than without it.
>
> My results are following:
> Use range [false]. Time spent (avg): [0]
> Use range [true]. Time spent (avg): [525]
>
> There are KeyValues listed, when time range is not used.
>
> The code is following:
>     public static void run(boolean useRange, HTable table) throws Exception
> {
>         Scan scan = new Scan().addFamily( family ).setCaching( -1
> ).setCacheBlocks( false );
>         scan.setStartRow( random start row );
>         if (useRange) scan.setTimeRange(1348114401600L, 1348114401700L);
>
>         ResultScanner scanner = table.getScanner(scan);
>         for(int i = 0 ; i < N; i++) { // There were bunch of measures, where
> N was from 10 to 50
>             long time = System.currentTimeMillis();
>             result = scanner.next();
>             sum += (System.currentTimeMillis() - time) / N;
>         }
>     }
>
> Of course such a measurements are include all sort of noise like network
> overhead, etc, but I'm using virtual machine on my own box, and at the time
> I do measurement there is no other activity neither on my own box or this
> virtual machine, so such a noise is minimum.
>
> Also I've used YourKit to measure tracing and sampling of running
> HRegionServer, but didn't found anything suspicious. Though I didn't look at
> heap and GC perf. Tracing is in attach.
>
> So, the question is why is it so slow when time range is set and so fast
> without it?
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com

Mime
View raw message