hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin O'Dell" <ke...@rocana.com>
Subject Re: Scan time increasing linearly
Date Wed, 03 May 2017 13:32:33 GMT
Hi Lydia,

  Welcome to the wonderful world of HBase! I don't think it is wrong that
you are seeing linear results from doing a scan.  When doing a scan HBase
will collect X amount of rows to return to the client. X being the value of
your scan cache. If each round trip grabs 100 rows and takes 1 second to do
it, then it is safe to assume time will grow in a linear nature.  The good
news is HBase is much faster than the example I gave.  I would recommend
looking at how much you are caching and raise that value, though I am not
surprised your scans are growing in a linear nature as the scan function is
rather linear itself. Does this make sense?

Also I may be completely wrong so I will defer to anyone else's expert

On Wed, May 3, 2017 at 6:51 AM, Lydia <icklerly@googlemail.com> wrote:

> Hi,
> I would like to know if my query times seem appropriate since I do not
> have a lot experience with HBase.
> I have three tables - stored in HDFS, on one machine:
>         table1: 5 million rows
>         table2: 15 million rows
>         table3: 90 million rows
> I do a scan using the Java API including a prefix-filter and some column
> filter.
> My rowkeys are encoded with geohashes.
> Execution Times:
>         table1: ~   3.072 s
>         table2: ~ 10.117 s
>         table3: ~ 60.00 s
> It seems really odd to me that the execution time is increasing linear
> with the amount of rows!
> Am I doing something terribly wrong?
> Thanks in advance!
> Best regards,
> Lydia

Kevin O'Dell
Field Engineer
850-496-1298 | Kevin@rocana.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message