hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Scan vs Parallel scan.
Date Tue, 16 Sep 2014 10:51:40 GMT
I attach the code than I'm executing. I don't have accss to the generator
to HBase.
In the last benchmark, simple scan takes about 4 times less than this
version.

With that version is available just to do complete scans.
I have been trying a complete scan of a HTable with 100.000 rows and it
takes less than one second, is it not too fast???




2014-09-14 20:21 GMT+02:00 Guillermo Ortiz <konstt2000@gmail.com>:

> I don't have the code here. But I created a class RegionScanner, this
> class does a complete scan of a region. So I have to set the start and stop
> keys. the start and stop key are the limits of that region.
>
> El domingo, 14 de septiembre de 2014, Anoop John <anoop.hbase@gmail.com>
> escribió:
>
> Again full code snippet can better speak.
>>
>> But not getting what u r doing with below code
>>
>> private List<RegionScanner> generatePartitions() {
>>         List<RegionScanner> regionScanners = new
>> ArrayList<RegionScanner>();
>>         byte[] startKey;
>>         byte[] stopKey;
>>         HConnection connection = null;
>>         HBaseAdmin hbaseAdmin = null;
>>         try {
>>             connection = HConnectionManager.
>> createConnection(HBaseConfiguration.create());
>>             hbaseAdmin = new HBaseAdmin(connection);
>>             List<HRegionInfo> regions =
>> hbaseAdmin.getTableRegions(scanConfiguration.getTable());
>>             RegionScanner regionScanner = null;
>>             for (HRegionInfo region : regions) {
>>
>>                 startKey = region.getStartKey();
>>                 stopKey = region.getEndKey();
>>
>>                 regionScanner = new RegionScanner(startKey, stopKey,
>> scanConfiguration);
>>                 // regionScanner = createRegionScanner(startKey, stopKey);
>>                 if (regionScanner != null) {
>>                     regionScanners.add(regionScanner);
>>                 }
>>             }
>>
>> And I execute the RegionScanner with this:
>> public List<Result> call() throws Exception {
>>         HConnection connection =
>> HConnectionManager.
>> createConnection(HBaseConfiguration.create());
>>         HTableInterface table =
>> connection.getTable(configuration.getTable());
>>
>>     Scan scan = new Scan(startKey, stopKey);
>>         scan.setBatch(configuration.getBatch());
>>         scan.setCaching(configuration.getCaching());
>>         ResultScanner resultScanner = table.getScanner(scan);
>>
>>
>> What is this part?
>> new RegionScanner(startKey, stopKey,
>> scanConfiguration);
>>
>>
>> >>Scan scan = new Scan(startKey, stopKey);
>>         scan.setBatch(configuration.
>> getBatch());
>>         scan.setCaching(configuration.getCaching());
>>         ResultScanner resultScanner = table.getScanner(scan);
>>
>>
>> And not setting start and stop rows to this Scan object? !!
>>
>>
>> Sorry If I missed some parts from ur code.
>>
>> -Anoop-
>>
>>
>> On Sun, Sep 14, 2014 at 2:54 PM, Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>>
>> > I don't have the code here,, but I'll put the code in a couple of days.
>> I
>> > have to check the executeservice again! I don't remember exactly how I
>> did.
>> >
>> > I'm using Hbase 0.98.
>> >
>> > El domingo, 14 de septiembre de 2014, lars hofhansl <larsh@apache.org>
>> > escribió:
>> >
>> > > What specific version of 0.94 are you using?
>> > >
>> > > In general, if you have multiple spindles (disks) and/or multiple CPU
>> > > cores at the region server you should benefits from keeping multiple
>> > region
>> > > server handler threads busy. I have experimented with this before and
>> > saw a
>> > > close to linear speed up (up to the point where all disks/core were
>> > busy).
>> > > Obviously this also assuming this is the only load you throw at the
>> > servers
>> > > at this point.
>> > >
>> > > Can you post your complete code to pastebin? Maybe even with some
>> code to
>> > > seed the data?
>> > > How do you run your callables? Did you configure the ExecuteService
>> > > correctly (assuming you use one to run your callables)?
>> > >
>> > > Then we can run it and have a look.
>> > >
>> > > Thanks.
>> > >
>> > > -- Lars
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: Guillermo Ortiz <konstt2000@gmail.com <javascript:;>>
>> > > To: "user@hbase.apache.org <javascript:;>" <user@hbase.apache.org
>> > > <javascript:;>>
>> > > Cc:
>> > > Sent: Saturday, September 13, 2014 4:49 PM
>> > > Subject: Re: Scan vs Parallel scan.
>> > >
>> > > What am I missing??
>> > >
>> > >
>> > >
>> > >
>> > > 2014-09-12 16:05 GMT+02:00 Guillermo Ortiz <konstt2000@gmail.com
>> > > <javascript:;>>:
>> > >
>> > > > For an partial scan, I guess that I call to the RS to get data, it
>> > starts
>> > > > looking in the store files and recollecting the data. (It doesn't
>> write
>> > > to
>> > > > the blockcache in both cases). It has ready the data and it gives
to
>> > the
>> > > > client the data step by step, I mean,,, it depends the caching and
>> > > batching
>> > > > parameters.
>> > > >
>> > > > Big differences that I see...
>> > > > I'm opening more connections to the Table, one for Region.
>> > > >
>> > > > I should check the single table scan, it looks like it does partial
>> > scans
>> > > > sequentially. Since you can see on the HBase Master how the request
>> > > > increase one after another, not all in the same time.
>> > > >
>> > > > 2014-09-12 15:23 GMT+02:00 Michael Segel <michael_segel@hotmail.com
>> > > <javascript:;>>:
>> > > >
>> > > >> It doesn’t matter which RS, but that you have 1 thread for each
>> > region.
>> > > >>
>> > > >> So for each thread, what’s happening.
>> > > >> Step by step, what is the code doing.
>> > > >>
>> > > >> Now you’re comparing this against a single table scan, right?
>> > > >> What’s happening in the table scan…?
>> > > >>
>> > > >>
>> > > >> On Sep 12, 2014, at 2:04 PM, Guillermo Ortiz <konstt2000@gmail.com
>> > > <javascript:;>>
>> > > >> wrote:
>> > > >>
>> > > >> > Right, My table for example has keys between 0-9. in three
>> regions
>> > > >> > 0-2,3-7,7-9
>> > > >> > I lauch three partial scans in parallel. The scans that I'm
>> > executing
>> > > >> are:
>> > > >> > scan(0,2), scan(3,7), scan(7,9).
>> > > >> > Each region is if a different RS, so each thread goes to
>> different
>> > RS.
>> > > >> It's
>> > > >> > not exactly like that, but on the benchmark case it's like
it's
>> > > working.
>> > > >> >
>> > > >> > Really the code will execute a thread for each Region not
for
>> each
>> > > >> > RegionServer. But in the test I only have two regions for
>> > > regionServer.
>> > > >> I
>> > > >> > dont' think that's an important point, there're two threads
for
>> RS.
>> > > >> >
>> > > >> > 2014-09-12 14:48 GMT+02:00 Michael Segel <
>> michael_segel@hotmail.com
>> > > <javascript:;>>:
>> > > >> >
>> > > >> >> Ok, lets again take a step back…
>> > > >> >>
>> > > >> >> So you are comparing your partial scan(s) against a full
table
>> > scan?
>> > > >> >>
>> > > >> >> If I understood your question, you launch 3 partial scans
where
>> you
>> > > set
>> > > >> >> the start row and then end row of each scan, right?
>> > > >> >>
>> > > >> >> On Sep 12, 2014, at 9:16 AM, Guillermo Ortiz <
>> konstt2000@gmail.com
>> > > <javascript:;>>
>> > > >> wrote:
>> > > >> >>
>> > > >> >>> Okay, then, the partial scan doesn't work as I think.
>> > > >> >>> How could it exceed the limit of a single region
if I calculate
>> > the
>> > > >> >> limits?
>> > > >> >>>
>> > > >> >>>
>> > > >> >>> The only bad point that I see it's that If a region
server has
>> > three
>> > > >> >>> regions of the same table,  I'm executing three partial
scans
>> > about
>> > > >> this
>> > > >> >> RS
>> > > >> >>> and they could compete for resources (network, etc..)
on this
>> > node.
>> > > >> It'd
>> > > >> >> be
>> > > >> >>> better to have one thread for RS. But, that doesn't
answer your
>> > > >> >> questions.
>> > > >> >>>
>> > > >> >>> I keep thinking...
>> > > >> >>>
>> > > >> >>> 2014-09-12 9:40 GMT+02:00 Michael Segel <
>> > michael_segel@hotmail.com
>> > > <javascript:;>>:
>> > > >> >>>
>> > > >> >>>> Hi,
>> > > >> >>>>
>> > > >> >>>> I wanted to take a step back from the actual
code and to stop
>> and
>> > > >> think
>> > > >> >>>> about what you are doing and what HBase is doing
under the
>> > covers.
>> > > >> >>>>
>> > > >> >>>> So in your code, you are asking HBase to do 3
separate scans
>> and
>> > > then
>> > > >> >> you
>> > > >> >>>> take the result set back and join it.
>> > > >> >>>>
>> > > >> >>>> What does HBase do when it does a range scan?
>> > > >> >>>> What happens when that range scan exceeds a single
region?
>> > > >> >>>>
>> > > >> >>>> If you answer those questions… you’ll have
your answer.
>> > > >> >>>>
>> > > >> >>>> HTH
>> > > >> >>>>
>> > > >> >>>> -Mike
>> > > >> >>>>
>> > > >> >>>> On Sep 12, 2014, at 8:34 AM, Guillermo Ortiz
<
>> > konstt2000@gmail.com
>> > > <javascript:;>>
>> > > >> >> wrote:
>> > > >> >>>>
>> > > >> >>>>> It's not all the code, I set things like
these as well:
>> > > >> >>>>> scan.setMaxVersions();
>> > > >> >>>>> scan.setCacheBlocks(false);
>> > > >> >>>>> ...
>> > > >> >>>>>
>> > > >> >>>>> 2014-09-12 9:33 GMT+02:00 Guillermo Ortiz
<
>> konstt2000@gmail.com
>> > > <javascript:;>>:
>> > > >> >>>>>
>> > > >> >>>>>> yes, that is. I have changed the HBase
version to 0.98
>> > > >> >>>>>>
>> > > >> >>>>>> I got the start and stop keys with this
method:
>> > > >> >>>>>> private List<RegionScanner> generatePartitions()
{
>> > > >> >>>>>>      List<RegionScanner> regionScanners
= new
>> > > >> >>>>>> ArrayList<RegionScanner>();
>> > > >> >>>>>>      byte[] startKey;
>> > > >> >>>>>>      byte[] stopKey;
>> > > >> >>>>>>      HConnection connection = null;
>> > > >> >>>>>>      HBaseAdmin hbaseAdmin = null;
>> > > >> >>>>>>      try {
>> > > >> >>>>>>          connection = HConnectionManager.
>> > > >> >>>>>> createConnection(HBaseConfiguration.create());
>> > > >> >>>>>>          hbaseAdmin = new HBaseAdmin(connection);
>> > > >> >>>>>>          List<HRegionInfo> regions
=
>> > > >> >>>>>> hbaseAdmin.getTableRegions(scanConfiguration.getTable());
>> > > >> >>>>>>          RegionScanner regionScanner
= null;
>> > > >> >>>>>>          for (HRegionInfo region : regions)
{
>> > > >> >>>>>>
>> > > >> >>>>>>              startKey = region.getStartKey();
>> > > >> >>>>>>              stopKey = region.getEndKey();
>> > > >> >>>>>>
>> > > >> >>>>>>              regionScanner = new RegionScanner(startKey,
>> > stopKey,
>> > > >> >>>>>> scanConfiguration);
>> > > >> >>>>>>              // regionScanner =
>> createRegionScanner(startKey,
>> > > >> >>>> stopKey);
>> > > >> >>>>>>              if (regionScanner != null)
{
>> > > >> >>>>>>                  regionScanners.add(regionScanner);
>> > > >> >>>>>>              }
>> > > >> >>>>>>          }
>> > > >> >>>>>>
>> > > >> >>>>>> And I execute the RegionScanner with
this:
>> > > >> >>>>>> public List<Result> call() throws
Exception {
>> > > >> >>>>>>      HConnection connection =
>> > > >> >>>>>>
>> > HConnectionManager.createConnection(HBaseConfiguration.create());
>> > > >> >>>>>>      HTableInterface table =
>> > > >> >>>>>> connection.getTable(configuration.getTable());
>> > > >> >>>>>>
>> > > >> >>>>>>  Scan scan = new Scan(startKey, stopKey);
>> > > >> >>>>>>      scan.setBatch(configuration.getBatch());
>> > > >> >>>>>>      scan.setCaching(configuration.getCaching());
>> > > >> >>>>>>      ResultScanner resultScanner = table.getScanner(scan);
>> > > >> >>>>>>
>> > > >> >>>>>>      List<Result> results = new
ArrayList<Result>();
>> > > >> >>>>>>      for (Result result : resultScanner)
{
>> > > >> >>>>>>          results.add(result);
>> > > >> >>>>>>      }
>> > > >> >>>>>>
>> > > >> >>>>>>      connection.close();
>> > > >> >>>>>>      table.close();
>> > > >> >>>>>>
>> > > >> >>>>>>      return results;
>> > > >> >>>>>>  }
>> > > >> >>>>>>
>> > > >> >>>>>> They implement Callable.
>> > > >> >>>>>>
>> > > >> >>>>>>
>> > > >> >>>>>> 2014-09-12 9:26 GMT+02:00 Michael Segel
<
>> > > michael_segel@hotmail.com <javascript:;>
>> > > >> >:
>> > > >> >>>>>>
>> > > >> >>>>>>> Lets take a step back….
>> > > >> >>>>>>>
>> > > >> >>>>>>> Your parallel scan is having the
client create N threads
>> where
>> > > in
>> > > >> >> each
>> > > >> >>>>>>> thread, you’re doing a partial
scan of the table where each
>> > > >> partial
>> > > >> >>>> scan
>> > > >> >>>>>>> takes the first and last row of each
region?
>> > > >> >>>>>>>
>> > > >> >>>>>>> Is that correct?
>> > > >> >>>>>>>
>> > > >> >>>>>>> On Sep 12, 2014, at 7:36 AM, Guillermo
Ortiz <
>> > > >> konstt2000@gmail.com <javascript:;>>
>> > > >> >>>>>>> wrote:
>> > > >> >>>>>>>
>> > > >> >>>>>>>> I was checking a little bit more
about,, I checked the
>> > cluster
>> > > >> and
>> > > >> >>>> data
>> > > >> >>>>>>> is
>> > > >> >>>>>>>> store in three different regions
servers, each one in a
>> > > >> differente
>> > > >> >>>> node.
>> > > >> >>>>>>>> So, I guess the threads go to
different hard-disks.
>> > > >> >>>>>>>>
>> > > >> >>>>>>>> If someone has an idea or suggestion..
why it's faster a
>> > single
>> > > >> scan
>> > > >> >>>>>>> than
>> > > >> >>>>>>>> this implementation. I based
on this implementation
>> > > >> >>>>>>>> https://github.com/zygm0nt/hbase-distributed-search
>> > > >> >>>>>>>>
>> > > >> >>>>>>>> 2014-09-11 12:05 GMT+02:00 Guillermo
Ortiz <
>> > > konstt2000@gmail.com <javascript:;>
>> > > >> >:
>> > > >> >>>>>>>>
>> > > >> >>>>>>>>> I'm working with HBase 0.94
for this case,, I'll try with
>> > > 0.98,
>> > > >> >>>>>>> although
>> > > >> >>>>>>>>> there is not difference.
>> > > >> >>>>>>>>> I disabled the table and
disabled the blockcache for that
>> > > family
>> > > >> >> and
>> > > >> >>>> I
>> > > >> >>>>>>> put
>> > > >> >>>>>>>>> scan.setBlockcache(false)
as well for both cases.
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> I think that it's not possible
that I executing an
>> complete
>> > > scan
>> > > >> >> for
>> > > >> >>>>>>> each
>> > > >> >>>>>>>>> thread since my data are
the type:
>> > > >> >>>>>>>>> 000001 f:q value=1
>> > > >> >>>>>>>>> 000002 f:q value=2
>> > > >> >>>>>>>>> 000003 f:q value=3
>> > > >> >>>>>>>>> ...
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> I add all the values and
get the same result on a single
>> > scan
>> > > >> than
>> > > >> >> a
>> > > >> >>>>>>>>> distributed, so, I guess
that DistributedScan did well.
>> > > >> >>>>>>>>> The count from the hbase
shell takes about 10-15seconds,
>> I
>> > > don't
>> > > >> >>>>>>> remember,
>> > > >> >>>>>>>>> but like 4x  of the scan
time.
>> > > >> >>>>>>>>> I'm not using any filter
for the scans.
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> This is the way I calculate
number of regions/scans
>> > > >> >>>>>>>>> private List<RegionScanner>
generatePartitions() {
>> > > >> >>>>>>>>>     List<RegionScanner>
regionScanners = new
>> > > >> >>>>>>>>> ArrayList<RegionScanner>();
>> > > >> >>>>>>>>>     byte[] startKey;
>> > > >> >>>>>>>>>     byte[] stopKey;
>> > > >> >>>>>>>>>     HConnection connection
= null;
>> > > >> >>>>>>>>>     HBaseAdmin hbaseAdmin
= null;
>> > > >> >>>>>>>>>     try {
>> > > >> >>>>>>>>>         connection =
>> > > >> >>>>>>>>>
>> > > >> HConnectionManager.createConnection(HBaseConfiguration.create());
>> > > >> >>>>>>>>>         hbaseAdmin = new
HBaseAdmin(connection);
>> > > >> >>>>>>>>>         List<HRegionInfo>
regions =
>> > > >> >>>>>>>>> hbaseAdmin.getTableRegions(scanConfiguration.getTable());
>> > > >> >>>>>>>>>         RegionScanner regionScanner
= null;
>> > > >> >>>>>>>>>         for (HRegionInfo
region : regions) {
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>             startKey = region.getStartKey();
>> > > >> >>>>>>>>>             stopKey = region.getEndKey();
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>             regionScanner
= new RegionScanner(startKey,
>> > > stopKey,
>> > > >> >>>>>>>>> scanConfiguration);
>> > > >> >>>>>>>>>             // regionScanner
=
>> createRegionScanner(startKey,
>> > > >> >>>>>>> stopKey);
>> > > >> >>>>>>>>>             if (regionScanner
!= null) {
>> > > >> >>>>>>>>>                 regionScanners.add(regionScanner);
>> > > >> >>>>>>>>>             }
>> > > >> >>>>>>>>>         }
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> I did some test for a tiny
table and I think that the
>> range
>> > > for
>> > > >> >> each
>> > > >> >>>>>>> scan
>> > > >> >>>>>>>>> works fine. Although, I though
that it was interesting
>> that
>> > > the
>> > > >> >> time
>> > > >> >>>>>>> when I
>> > > >> >>>>>>>>> execute distributed scan
is about 6x.
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> I'm going to check about
the hard disks, but I think that
>> > ti's
>> > > >> >> right.
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>> 2014-09-11 7:50 GMT+02:00
lars hofhansl <
>> larsh@apache.org
>> > > <javascript:;>>:
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>> Which version of HBase?
>> > > >> >>>>>>>>>> Can you show us the code?
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> Your parallel scan with
caching 100 takes about 6x as
>> long
>> > as
>> > > >> the
>> > > >> >>>>>>> single
>> > > >> >>>>>>>>>> scan, which is suspicious
because you say you have 6
>> > regions.
>> > > >> >>>>>>>>>> Are you sure you're not
accidentally scanning all the
>> data
>> > in
>> > > >> each
>> > > >> >>>> of
>> > > >> >>>>>>>>>> your parallel scans?
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> -- Lars
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> ________________________________
>> > > >> >>>>>>>>>> From: Guillermo Ortiz
<konstt2000@gmail.com
>> > <javascript:;>>
>> > > >> >>>>>>>>>> To: "user@hbase.apache.org
<javascript:;>" <
>> > > user@hbase.apache.org <javascript:;>>
>> > > >> >>>>>>>>>> Sent: Wednesday, September
10, 2014 1:40 AM
>> > > >> >>>>>>>>>> Subject: Scan vs Parallel
scan.
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> Hi,
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> I developed an distributed
scan, I create an thread for
>> > each
>> > > >> >> region.
>> > > >> >>>>>>> After
>> > > >> >>>>>>>>>> that, I've tried to get
some times Scan vs
>> DistributedScan.
>> > > >> >>>>>>>>>> I have disabled blockcache
in my table. My cluster has 3
>> > > region
>> > > >> >>>>>>> servers
>> > > >> >>>>>>>>>> with 2 regions each one,
in total there are 100.000 rows
>> > and
>> > > >> >>>> execute a
>> > > >> >>>>>>>>>> complete scan.
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> My partitions are
>> > > >> >>>>>>>>>> -01666 -> request
16665
>> > > >> >>>>>>>>>> 016666-033332 -> request
16666
>> > > >> >>>>>>>>>> 033332-049998 -> request
16666
>> > > >> >>>>>>>>>> 049998-066664 -> request
16666
>> > > >> >>>>>>>>>> 066664-083330 -> request
16666
>> > > >> >>>>>>>>>> 083330- -> request
16671
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:15:47 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:15:47 INFO
util.TimerUtil: SCAN
>> > > >> >>>>>>> PARALLEL:22089ms,Counter:2 ->
>> > > >> >>>>>>>>>> Caching 10
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:16:04 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:16:04 INFO
util.TimerUtil: SCAN
>> > > >> >>>>>>> PARALJEL:16598ms,Counter:2 ->
>> > > >> >>>>>>>>>> Caching 100
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:16:22 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:16:22 INFO
util.TimerUtil: SCAN
>> > > >> >>>>>>> PARALLEL:16497ms,Counter:2 ->
>> > > >> >>>>>>>>>> Caching 1000
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:17:41 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:17:41 INFO
util.TimerUtil: SCAN
>> > > >> >> NORMAL:68288ms,Counter:2
>> > > >> >>>>>>> ->
>> > > >> >>>>>>>>>> Caching 1
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:17:48 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:17:48 INFO
util.TimerUtil: SCAN
>> > > >> >> NORMAL:2646ms,Counter:2
>> > > >> >>>> ->
>> > > >> >>>>>>>>>> Caching 100
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> 14/09/10 09:17:58 INFO
hbase.HbaseScanTest: NUM ROWS
>> 100000
>> > > >> >>>>>>>>>> 14/09/10 09:17:58 INFO
util.TimerUtil: SCAN
>> > > >> >> NORMAL:3903ms,Counter:2
>> > > >> >>>> ->
>> > > >> >>>>>>>>>> Caching 1000
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> Parallel scan works much
worse than simple scan,, and I
>> > don't
>> > > >> know
>> > > >> >>>> why
>> > > >> >>>>>>>>>> it's
>> > > >> >>>>>>>>>> so fast,, it's really
much faster than execute an
>> "count"
>> > > from
>> > > >> >> hbase
>> > > >> >>>>>>>>>> shell,
>> > > >> >>>>>>>>>> what it doesn't look
pretty notmal. The only time that
>> it
>> > > works
>> > > >> >>>> better
>> > > >> >>>>>>>>>> parallel is when I execute
a normal scan with caching 1.
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>> Any clue about it?
>> > > >> >>>>>>>>>>
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>>>
>> > > >> >>>>>>>
>> > > >> >>>>>>>
>> > > >> >>>>>>
>> > > >> >>>>
>> > > >> >>>>
>> > > >> >>
>> > > >> >>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>

Mime
View raw message