hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: Scan vs Parallel scan.
Date Wed, 10 Sep 2014 17:20:35 GMT
Hello Guillermo,

Sounds like some potential contention going on, how many disks per node you
have?

Can you explain further what do you mean by "and I don't know why it's so
fast,, it's really much faster than execute an "count" from hbase shell,"
the count command from the shell uses the FirstKeyOnlyFilter and a caching
of 10 which should be close to the behavior of your testing tool if its
using the same filter and the same cache settings.

cheers,
esteban.




--
Cloudera, Inc.


On Wed, Sep 10, 2014 at 1:40 AM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:

> Hi,
>
> I developed an distributed scan, I create an thread for each region. After
> that, I've tried to get some times Scan vs DistributedScan.
> I have disabled blockcache in my table. My cluster has 3 region servers
> with 2 regions each one, in total there are 100.000 rows and execute a
> complete scan.
>
> My partitions are
> -01666 -> request 16665
> 016666-033332 -> request 16666
> 033332-049998 -> request 16666
> 049998-066664 -> request 16666
> 066664-083330 -> request 16666
> 083330- -> request 16671
>
>
> 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 ->
> Caching 10
>
> 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 ->
> Caching 100
>
> 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 ->
> Caching 1000
>
> 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 ->
> Caching 1
>
> 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 ->
> Caching 100
>
> 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000
> 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 ->
> Caching 1000
>
> Parallel scan works much worse than simple scan,, and I don't know why it's
> so fast,, it's really much faster than execute an "count" from hbase shell,
> what it doesn't look pretty notmal. The only time that it works better
> parallel is when I execute a normal scan with caching 1.
>
> Any clue about it?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message