hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Reiter <a.rei...@web.de>
Subject full table scan
Date Mon, 06 Jun 2011 08:48:09 GMT
hello everybody

i'm trying to scan my hbase table for reporting purposes
the cluster has 4 servers:
  - server1: namenode, secondary namenode, jobtracker, hbase master, zookeeper1
  - server2: datanode, tasktracker, hbase regionserver, zookeeper2
  - server3: datanode, tasktracker, hbase regionserver, zookeeper3
  - server4: datanode, tasktracker, hbase regionserver
everything seems to work properly
  - hadoop-0.20.2-CDH3B4
  - hbase-0.90.1-CDH3B4
  - zookeeper-3.3.2-CDH3B4

at the moment our hbase table has 300000 entries

if i do a table scan over the hbase api  (at the moment without a filter)
ResultScanner scanner = table.getScanner(...);

it takes about 60 seconds to process, which is actually okey, because all records are processed
be only one thread sequentially
BUT it takes approximately the same time, if i do a scan over Map&Reduce job using TableInputFormat

i'm definitely doing something wrong, because the processing time is going up directly proportional
to the number of rows.
in my understanding, the big advantage of hadoop/hbase is, that huge numbers of entries can
be processed in parallel and very fast

300k entries are not much, we expecting this number to be added hourly to our cluster, but
the processing time is increasing, which is actually not acceptable

any one an idea, what i'm doing wrong?

best regards

View raw message