hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Scan vs Parallel scan.
Date Thu, 11 Sep 2014 05:50:12 GMT
Which version of HBase?
Can you show us the code?


Your parallel scan with caching 100 takes about 6x as long as the single scan, which is suspicious
because you say you have 6 regions.
Are you sure you're not accidentally scanning all the data in each of your parallel scans?

-- Lars



________________________________
 From: Guillermo Ortiz <konstt2000@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Wednesday, September 10, 2014 1:40 AM
Subject: Scan vs Parallel scan.
 

Hi,

I developed an distributed scan, I create an thread for each region. After
that, I've tried to get some times Scan vs DistributedScan.
I have disabled blockcache in my table. My cluster has 3 region servers
with 2 regions each one, in total there are 100.000 rows and execute a
complete scan.

My partitions are
-01666 -> request 16665
016666-033332 -> request 16666
033332-049998 -> request 16666
049998-066664 -> request 16666
066664-083330 -> request 16666
083330- -> request 16671


14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 ->
Caching 10

14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 ->
Caching 100

14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 ->
Caching 1000

14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 ->
Caching 1

14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 ->
Caching 100

14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000
14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 ->
Caching 1000

Parallel scan works much worse than simple scan,, and I don't know why it's
so fast,, it's really much faster than execute an "count" from hbase shell,
what it doesn't look pretty notmal. The only time that it works better
parallel is when I execute a normal scan with caching 1.

Any clue about it?
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message